These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!
Contents
So today, you’re in for a treat. You’re going to learn a theorem and I won't prove anything to you, you learn it by doing but this theorem is really important for all of statistics. So let’s dive in.And what this theorem really is about many coin flips. Almost generally it’s about the distribution of the sum of many things.So let’s study many things.
So let’s look at a coin. It has outcome 0-1 and we’ve already learned that the probability of the sum equals exactly K for N coin flips;; looks like this cryptic formula over here:N factorial divided by N minus K factorial andK factorial. And then we had some other term here,if the coin is fair, it is like 2 to the minus N. Right now, I’m just going to care about these coins over here. I won't care about the probability for the time being. I want you to help me construct a table for these counts as N gets larger.So we start with N equals 1, go all the way to N equals 4, obviously outcomes range from zero all the way to 4 at 4 coin flips. So here’s our table. It’s also quite obvious that these guys over here can never occur: if you flip the coin only once, the outcome can’t be 2. So we’re going to fill out this table over here together for the coin over here. Tell me, for N equals 1, what are the two values that go right over here,just those two values.
And the answer is 1 and 1,N factorial over N minus K factorial, K factorial - they are all ones,just 1, 1, 1. That was easy.
Let’s go to equal 2 and ask for the sum, what are these coefficients for an equal 2?
And the answer is 1, 211. Let’s plug-in zero.2 factorial is 2 divided by 2 factorial gives us 1 and zero factorial is 1. So that’s 1 over here. But if you plug-in 1SK we get 2 factorial over here which is 2 divided by and another 1 stays 2. So this is the correct answer.
Let’s go N equals 3. Now, it gets interesting.
The correct answer for this formula is 1, 3, 3, 1and you plug them in, you can see it’s correct.This is actually called a Pascal Triangle. If you have never seen this before, you can check it on Wikipedia. It’s actually quite easily calculated.
Each element over here is the sum of the value above plus the value on the left. This guy over here, sum of the value above, be zero plus value on the left. So let’s fill in the same for N equals 4.
The answer is 1, 4, 6, 4, 1. I used the Pascal trick but when you plug it in, you get exactly the same number over here.
So let’s look at these distributions over here. They are really interesting. In the first case,you get something like this: just the counts . In the second case, it looks more like this. In the third case and in the fourth case, we get something that looks really interesting, like this form over here. So here is a question for you. What happens when N equals 10,000? When I draw it like a diagram like this, will it look like that, like that, like that?Check the one that looks most plausible.
And it so turns out it looks like this over here.You can see it over here. It looks pretty much like a function that’s very high in the center and flattens out to the sides. So how does this relate to anything of interest?
So let’s look at a more complicated case. Say we have a three-sided die and it can come up with the outcomes 0, 1 or 2. And again I just want to count, I don’t want to work in probabilities just like before, I’m going to help you a little bit.First, I’d like to know what’s the possible largest sum, the largest number I put over here for anything that’s four tosses if each outcome can be between 0, 1 and 2. Just write in the box over here.
And if that outcome is eight, if I toss it four times, the largest value is two every single time. Two times four is eight.
So, let’s now go all the way to eight and compute those numbers, and it just the counts, not the probabilities. Give me the first three numbers right over here.
When we throw this die exactly once, what do you get is one, one and one.
Now the tricky part starts at n equals two an dI will construct this for you, you can take over for three and four. For the one of the left to have the sum equal to one, we stay at one,which is we add up the value above and the twoon the left because there’s no three outcomes.This guy over here is two, it’s the one above and the two in the left, there’s nothing over here but there is something over here.This guy here is three, it’s a sum over the one above and the two on the left gives us three.We go down to two, which is a one above zero and the two on the left, all the way back to one.And this already looks very much like the type distribution we talked about like this.
Now, it’s your turn. Give me all those values over here. There’s six of them and just to help you a little I give you one of them. This over here is a six. Please fill them in.
And we would do just like before, we take thevalue from above, plus the two left ones. So one, two plus one is three, three plus two plus one is six, two, three and two with seven, six, three and one.
So now for the final question, there’s nine total values to be filled in, I give you two of them,four over here and there is a sixteen over here,so please fill in the seven remaining values.
And as before we take the one above plus the two left, one here, these two are four, six, three,one is ten, six, seven, three is sixteen, six,seven, six is nineteen, sixteen, back to ten,four and one.
And when we graph this, it looks about like this. It’s a big function that has the highest value at the center, nineteen and then falls down in some interesting way that looks awfully much like the way the previous exercise fell down.So even for this different experiment with the die, we get about the same phenomenon that the sum of those seems to have a distinct high probability for what’s the most likely outcome four and then it falls off in some interesting way.So this is effectively what’s called the central limit theorem. And this sounds complicated and in fact it is, but here is the simple version to remember. Suppose you’re doing many experiments. Each of these experiments over here have some distribution, we don’t care,it has to just be reasonable and you sum those all up for some very large number and if N is large then the joint distribution of the sum of those with approach a function it looks like this.That’s called a Gaussian. We discuss the Gaussian in depth. It has its own formula and typically if N is as big as thirty that is maybe your test thirty people whether they have cancer or not, that’s usually good enough to approximate this really complicated binomial over here with this relatively simple Gaussian function. We’ll learn more about this later,but the key thing to remember is we take many experiments, you add them up and outcomes a function just like this. That’s called the central limit theorem.
Note:
If you didn't understand the lesson, try this explanation. The data on the tables represents the number of possibilities where sum of individual outcomes is equal to (e.g., for the possibility (Heads, Heads, Heads, Tails, Tails, Heads, Tails), where Heads = 1 and Tails = 0, the sum of the individual outcomes (each flip) is equal to 4, for flips). For the first table, the values in the cells represent the number of possibilities where sum of coin flips is equal to . Remembering that the individual outcomes of a fair coin can be either 0 (Tails) or 1 (Heads).
k=0 | k=1 | k=2 | k=3 | k=4 | |
n=1 | 1 | 1 | |||
n=2 | 1 | 2 | 1 | ||
n=3 | 1 | 3 | 3 | 1 | |
n=4 | 1 | 4 | 6 | 4 | 1 |
For
, i.e., you have flipped your coin just once, you have the following possibilities:{0}
{1}
If you sum up the individual outcomes for each possibility, you will get:
Sum({0}) = 0
Sum({1}) = 1
So, if you look at the table, for
, you will see that there is one possibility where the sum of the individual outcomes is 0 and one possibility where the sum of the individual outcomes is 1.For
, i.e., you have flipped your coin twice, you have the following possibilities:{0, 0}
{0, 1}
{1, 0}
{1, 1}
If you sum the individual outcomes for each possibility, you will get:
Sum({0, 0}) = 0
Sum({0, 1}) = 1
Sum({1, 0}) = 1
Sum({1, 1}) = 2
So, if you look at the table, for
, you will see that there is one possibility where the sum of the individual outcomes is 0, two possibilities where the sum of the individual outcomes is 1, and one possibility where the sum of the individual outcomes is 2.That's a naive way to obtain the numbers on the table. But for a binomial distribution (coin flipping would be an example of such distribution) you can use the combination formula and here's why:
The combination or binomial coefficient, (read as choose ) is defined as follows:
if and if
A combination,
, represents the number of ways you can choose elements from a set of elements. Making if makes sense, since if you have 3 elements in a set ({a,b,c} for example) and try to combine them in sets of 4 distinct elements ( size sets), you simply can't do it. You don't have enough elements in your size set.Since a coin flip will result on either Heads (
) or Tails ( ), the sum of the individual outcomes for flips is equal to the number of Heads. For a coin, the number of possibilities where you get Heads in flips is equal to , as you have learned in previous units (Unit 18: "Binomial Distribution"). That's because of the following:Let's say you throw a coin
times. You could label each flip according with the order you have performed the flips, e.g.:{1, 2, 3, 4, 5, 6, 7, 8, 9, ... , n}
So, 1 represents the 1st flip, 2 represents the 2nd flip and so on, until
, which represents the n-th flip.Let's say you want to know how many different possibilities you could have where the sum of the flips is equal to 2. For this particular case you would need two flips equal to 1 and the remaining flips equal to 0. One possibility would be having the 1st and 2nd flips equal to Heads and the remaining ones equal to Tails, another one would be having the 2nd and the 3th flips equal to Heads and the remaining ones equal to Tails, another one 5th and 9th, another one 7th and 11th, etc, i.e., every single combination.
So the problem boils down to how many pairs ({1, 2}, {2, 3}, {5, 9}, {7, 11}, ..., etc) you can choose from a set of
elements, or . This would be equal to the number of possibilities where the sum is equal to 2. Note that you can generalize this to elements, or the number of ways you could choose elements from a set of elements, or , which would be the number of possibilities where the sum is equal to .As has been pointed out, the number of possibilities where the sum is equal to
is for the binomial distribution, but for a 3-sided dice this isn't true. The distribution for a 3-sided dice isn't binomial. Remember binomial, bi means two, i.e., Heads or Tails, Raining or Not Raining, Positive or Negative, etc. A 3-sided dice has three possible individual outcomes, {0, 1, 2} and therefore for a 3-sided die, does not represent the number of possibilities for which the sum of the individual outcomes is equal to .From unit 19A-7 or "Number of individual outcomes" onward you can't use
. A way to fill in the table for the 3-sided dice is using the naive way I've described for coin flipping at the beginning.Let's use this naive method for the dice example then. The columns of this table represent the number of possibilities that result in a sum of individual outcomes equal to
. The individual outcomes of a 3-sided dice can be 0, 1 or 2.k=0 | k=1 | k=2 | k=3 | k=4 | k=5 | k=6 | k=7 | k=8 | |
n=1 | 1 | 1 | 1 | ||||||
n=2 | 1 | 2 | 3 | 2 | 1 | ||||
n=3 | 1 | 3 | 6 | 7 | 6 | 3 | 1 | ||
n=4 | 1 | 4 | 10 | 16 | 19 | 16 | 10 | 4 | 1 |
For
, i.e., you have thrown your dice just once, you have the following possibilities:{0}
{1}
{2}
If you sum up the individual outcomes for each possibility, you will get:
Sum({0}) = 0
Sum({1}) = 1
Sum({2}) = 2
For
, i.e., you have thrown your dice twice, you have the following possibilities:{0, 0}
{0, 1}
{0, 2}
{1, 0}
{1, 1}
{1, 2}
{2, 0}
{2, 1}
{2, 2}
Resulting in the following sums:
Sum({0, 0}) = 0
Sum({0, 1}) = 1
Sum({0, 2}) = 2
Sum({1, 0}) = 1
Sum({1, 1}) = 2
Sum({1, 2}) = 3
Sum({2, 0}) = 2
Sum({2, 1}) = 3
Sum({2, 2}) = 4
So, you have one possibility where the sum is 0, two possibilities where the sum is 1, three possibilities where the sum is 2, two possibilities where the sum is 3 and one possibility where the sum is 4. It's a lot of hard work doing this for n = 3 or n = 4, but you don't actually have to do it, since there is a pattern on this table, as shown by the instructor:
k=0 | k=1 | k=2 | k=3 | k=4 | k=5 | k=6 | k=7 | k=8 | |
n=1 | 1 | 1 | 1 | ||||||
n=2 | 1 | 2 | 3 | 2 | 1 | ||||
n=3 | 1 | 3 | 6 | 7 | 6 | 3 | 1 | ||
n=4 | 1 | 4 | 10 | 16 | 19 | 16 | 10 | 4 | 1 |
If you recall the lecture, the pattern is that each element is equal to the sum the element immediately above and the two elements to the left of the element immediately above. For example, Pascal's Simplice and that's why this works.
. The pattern of the elements happens to be that of aWhat's the insight you get from all this work? If you look at the table for the 3-sided die:
k=0 | k=1 | k=2 | k=3 | k=4 | k=5 | k=6 | k=7 | k=8 | |
n=1 | 1 | 1 | 1 | ||||||
n=2 | 1 | 2 | 3 | 2 | 1 | ||||
n=3 | 1 | 3 | 6 | 7 | 6 | 3 | 1 | ||
n=4 | 1 | 4 | 10 | 16 | 19 | 16 | 10 | 4 | 1 |
More specifically the last row,
, where you have the following values: 1, 4, 10, 16, 19, 16, 10, 4, 1. If you plot these values in a barchart:Note the "bell shape". The idea of the central limit theorem is that if you increase the number of flips
and change the plot's scale (the number of bars will increase and the marks for the value of will get narrower and narrower), more and more this shape will resemble the shape of a Gaussian function. If you keep increasing and re-scaling the graph until goes to infinity, the shape of the graph will be exactly the shape of a Gaussian function.Let's go a step further and replace
by on the x axis. This doesn't change the shape of the graph, since dividing by , a constant, is just a normalization. It's equivalent to a change of scale. What's ? is sum of the individual outcomes for throws of a die. is the number of throws, therefore is the mean of the individual outcomes for throws, or the mean over a sample of size . Let's also divide the number of possibilities where the sum of the individual outcomes is equal to (y axis) by the total number possibilities for throws ( for a 3-sided die). If you do that, you will get the frequency (or probability) of the event (the sum of the individual outcomes is equal to ) instead of the number of times the event happens. Now we really have a probability distribution. A probability distribution of the mean over a sample of size . This is also a normalization and doesn't change the shape of the graph.What does this mean? It means that the value of the mean over the sample has a "bell shape" distribution.
Note also that the probability distribution of 3-sided die is uniform, not Gaussian. A deeper insight is that if you get a non-Gaussian distribution and calculate the mean over a sample of size
, the probability distribution of the mean over the sample gets closer and closer to a normal distribution as you increase the size of the sample ( ). When goes to infinity, the distribution becomes exactly a Gaussian distribution.