Coin-flipping Statistics: Probability

I started to write a post that incorporated some aspects of basic statistics and it quickly blew up out of control. So, I’ve decided to break it up into a few smaller posts. This one is focused on calculating the probability of a simple outcome.

If two events are independent of each other, we can multiply the probabilities together. If you fix eggs for breakfast 2/3rds of mornings and it rains later in the day 1/4th of the time and whether you fixed eggs or not and it rains or not has no influence on each other (e.g., feeling like it might rain doesn’t cause you to be more or less likely to cook eggs) then we can calculate the probability of both occurring on a given day by multiplying the probabilities together. $2/3 \times 1/4 = 2/12 = 1/6$ or about 17% of the time (1/4th of the 2/3rds). So, on average, a little more than one day a week, it will rain and you cooked eggs for breakfast. (Given some data we could also test if these indeed appear to be independent of each other or not by deviations from 1/6.)

If events are independent, you can multiply the probabilities that they do or do not occur together. If you fix eggs for breakfast 2/3 of the time and it rains 1/4 of the time, then 1/4 of the 2/3 of the total it rains and you fixed eggs. Other possible combinations can be calculated in the same way.

Since all outcomes have to, by definition, add up to 100%, we can also calculate the probability that you did not (fix eggs for breakfast and it rained) by subtracting from one. $1 - 1/6 = 5/6$ or about 83% of the time. I placed these in parentheses to indicate that both did not happen together—English can imply different meanings like you did not fix eggs and it did rain, which would be $1/3 \times 1/4 = 1/12$ or about 8.3% of the time or that you did not fix eggs and it didn’t rain, which is $1/3\times3/4 =3/12=1/4$ or 25%—you have to be careful about details when converting language statements into mathematical calculations, but it is possible and rather than being seen as a hindrance can actually help to understand math and language.

We expect the outcome of different dice rolls or coin flips to be independent of each other (the first coin flip does not influence the outcome of the second coin flip, whether the coin is fair or biased), so we can multiply their probabilities together. The probability of getting “heads” or “tails” from a fair coin flip is 1/2. If we got heads three times in a row (h,h,h) the probability of this is $1/2 \times 1/2 \times 1/2 = 1/8$ or 12.5%. 1/8 is the likelihood of hhh given that the coin is fair. If the coin were double-sided h/h then the likelihood of hhh is one, 100%.

A drawing of flipping a coin to make a decision, from Pietiäinen (1918) *Tasavallan Presidentit*.

The slightly trickier part is calculating the probability when there is a mix of outcomes. Often we don’t care about the order, so Thh, hTh, and hhT are all two heads and one tail. (I am using a capital T for tails to help visualize it, so it doesn’t get lost as easily in the hs; this is not meant to imply that I respect the tails side of the coin more than the heads side.) Each of these individual outcomes has a 1/8 probability of occurring. However, since there are three mutually exclusive ways to get two heads and one tail we add the three individual probabilities together, $1/8 + 1/8 + 1/8 = 3/8$ or about a 37.5% probability of this outcome. The likelihood of two heads and one tail, given that the probability of h is 1/2 and the probability of t is 1/2 (both add up to one) is 3/8. In statistics, the vertical bar symbol “|” is used to indicate “given”, so we can write $P(D|H) = 3/8$ . This is the probability ( $P$ ) of the data ( $D = hhT$ in any order) given (|) the hypothesis ( $H$ ) that the coin is fair. Under our hypothesis, the probability of heads is 1/2, $P(h) = 1/2$ . This particular probability, $P(D|H)$ , is known as the likelihood (of the data given the hypothesis).

There are two directions to go in here. One is looking at the likelihoods under different probabilities, e.g., P(H) = 1/3 or 1/10. Another, which I will do first, is to look some more at the combinations of different outcomes.

When we are looking at a small number of outcomes, it is easy to write down the probabilities. What are all of the probabilities from two coin flips? hh, hT, Th, and TT. All of these individual outcomes have a probability of $1/2 \times 1/2 = 1/4$ . However, getting one heads and one tails in either order, hT or Th, is $1/4 + 1/4 = 1/2$ . So getting an hT is twice as likely as hh. There are more ways to get a mix of outcomes, so we see a mix of outcomes more often. This is a simple statement, but it has a very deep meaning.

Let’s go to four coin flips. There is only one way to get hhhh, the probability is $P=(1/2)^4=1/16$ . There are four ways to get three heads and one tail, hhhT, hhTh, hThh, Thhh, $P= 4/16 = 1/4$ . How many ways are there to get two heads and two tails? hhTT, hThT, ThhT, TThh, ThTh, and hTTh, so 6, with a probability of 6/16 = 3/8. This is starting to get tricky. As the combinations get larger, it is harder to keep track, and we are more likely to make a mistake when writing them down. How many ways are there to get four tails and three heads with seven coin flips? I won’t even try to write it out.

There are two ways to do this. We can calculate the combinations directly using the equation for a binomial coefficient or we can look it up in Pascal’s triangle. I’m going to hold off on Pascal’s triangle because it connects to a lot of cool ideas and is worth its own blog post. The binomial coefficient is written as

${n \choose h} = \frac{n!}{h!T!}$

(where $n$ is the total number of coin flips, $h$ number that come out heads and $T$ is the number that come out Ts) and uses factorials (!, multiplying the integers together down to one, or two because multiplying by one has no effect). This almost looks like $n$ divided by $h$ with the dividing bar missing ( ${n \choose h}$ versus $\left(\frac{n}{h}\right)$ ), but it is not and is a form of writing this down that is reserved specifically for the binomial coefficient. ${n \choose h}$ is read as “n choose h“. It is essentially a bookkeeping equation that keeps track of all the possible ways to divide up the outcomes. This is built into a lot of calculators and programming languages. We can even write “7 choose 3” into the Google search bar and get an answer of 35. There are 35 orders of getting three hs and four Ts in seven coin flips, ThhTThT for example. Plugging this into our equation

${7 \choose 3} = \frac{7!}{3!4!} = \frac{7\times6\times5\times4\times3\times2}{3\times2\times4\times3\times2} = \frac{7\times6\times5}{3\times2} = \frac{210}{6} = 35$

So, there are 35 ways to flip a coin seven times and get heads three times (and tails four times). We still need to calculate the probability of an individual outcome, $(1/2)^7$ , and add the different possible orders to get it by multiplying by 35. The likelihood of this outcome is $35 \times (1/2)^7 \approx 0.273$ or 27.3%.

By the way, you might ask why it is n choose h instead of n choose T. It is arbitrary and doesn’t matter. You can write it down both ways and still get the same answer.

${7 \choose 3} = {7 \choose 4} = \frac{7!}{3!4!} = 35$

I can’t resist pointing out that there is also a connection between the behavior of the binomial coefficient as the number of coin flips gets larger and the normal or Gaussian distribution, also known as the bell curve. However, to really talk about this will be a separate post. One cool thing about mathematics, and many fields when you get deep enough into it, is that different ideas start to connect to each other, sometimes in surprising ways, that give you a deeper understanding of the processes involved.

The general likelihood equation we are using is

$P(D|H) = {n \choose h} P(h)^h P(T)^T$ .

In the case of fair coin flips $P(h)=P(T)=1/2$ , but this also works in cases that are unequal. The probability of a “one” coming up when rolling a die is 1/6. The probability it is not one is 5/6. If we roll a die seven times (or seven dice once) the probability of three ones and four “not ones” is

${7 \choose 3} (1/6)^3 (5/6)^4 = 35\times 1/216 \times 625/1296 = \frac{21,875}{279,936}\approx0.078$ .

So, to wrap this part up, you can multiply and add probabilities together to calculate the likelihood of a particular outcome (in this case flipping coins or rolling dice) when there are two possible outcomes of each event (heads or tails, or fixing eggs for breakfast or not). (More than two outcomes is not that different but will be a separate post.) You can also double-check the calculations to see that the sum of all possible outcomes adds up to 1 or 100%.

The way to really see this and understand the probabilities is to, rather than only passively reading about examples, make up a simple question, draw it out on a piece of paper, and go through the calculation yourself a couple of times to get comfortable with multiplying and adding probabilities together.

Links

A discussion of the derivation of the binomial coefficient, which I did not go into here but came across and am sharing if you are interested, https://math.stackexchange.com/questions/119480/derivation-of-binomial-coefficient-in-binomial-theorem

Media

Coin flipping image from Pietiäinen (1918) Tasavallan Presidentit, https://commons.wikimedia.org/wiki/File:St%C3%A5hlberg_flipping_coin2.jpg
Dice image, Rhetos (2021), https://commons.wikimedia.org/wiki/File:W%C3%BCrfelschauer.jpg

Coin-flipping Statistics: Probability

Comments

Leave a Reply Cancel reply