st101 » week-4 »

20.  The Normal Distribution

These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!
Please check the wiki guide for some tips on wiki editing.


01 Maximum Probability

What you are about to see is one of the most transformative parts of modern statistics and it uses things if like we've never seen before.We start out with the binomial distribution that you're familiar with from our last unit and then we move into the central limit theorem which basically means we've taken a number of coin flips to infinity.From that, we arrived with the normal distribution which is basis to so much in statistics--all of testing and confidence in results are defined though the normal distribution.And the reason why this matters is much of what we've done in coin flips had one or two coin flips but in statistics experiments, you often have 1000 of patients or 1000 of data points and then starting the normal distribution as an approximation to the binomial distribution is much more practical.So let's start--so when we start with our now well established formula for binomial distributions where N is the number of coin flips, k is how often it comes up with heads and p is the probability that the coin comes up with heads, usually 0.5.And I want to graph for fix n this function here. K can go from 0 all the way to 20.Instead of letting you compute this thing over here for all those different values of k,I'm going to ask a different question--suppose you have a fill coin,what do you think this value takes on its maximum value, which value of k maximizes our expression.I know you can't really know this but with some thought,I believe you'll arrive with the correct answer.


02 Maximum Probability Solution

I just programmed and ran the experiment,and the answer is 10.The reason why the answer is 10 is because the number of combinations to place 10 positives and 10 negatives into our list of 20 is larger than any other number. This term over here is maximized when k is exactly half of N--so 10.

03 Shape

The other interesting thing is things fall down in the interesting fashion as you deviate from 10, 11, 12, 13, 14 all the way to 0 or 20. Obviously we got a curve that looks a bit like this.This curves is often called a bell curve because it's quite feasible to think of it as a church bell--that's move left and right and rings the bells.I did a related experiment--actually I bought a piece of software to flip a coin 1000 times and if you did the last optional units on programming, you wrote a piece of software to flip a coin a 1000 times and from that, I looked at the empirical frequency which is the same as that count of heads divided by a 1000, but this one scales between 0 and 1.I called this thing an experiment--I flip the coin a 1000 times, out comes a one singular number which is the ratio of heads to the total number of experiments.It should be 0.5 in the ideal case but often it's a little bit off 0.5.I repeated this experiment 1000 times and that means I've got 1000 samples of this ratio over here. When I do this, I got a whole bunch of means.When I run a histogram over those, I might see a curve.What shape do you think the curve has--is it going to be like this, is it going to be like this,like this or like this, all are focused on 0.5.Which one do you think is it?

unnamed (1).jpg

04 Shape Solution

And it's this one. Let me show you.Here's a typical one, and I apologize the axis over here can't really be read,but you can take with faith that the center is 0.5,and you can see the characteristic bell curve for this simple coin-flipping experiment.For this run the mean was 0.50006.If I run it again, I get a different sample.And there is some randomness involved as this bar over here illustrates,but over the randomness you can clearly see the bell-shaped curve that flattens off on the sides.

05 Better Formula

The question really is can we find a better formula for this bell-shaped curve.The answer is, well, take your guess.

unnamed (2).jpg

06 Better Formula Solution

The answer is a resounding yes.Even more so, what I'm going to show you doesn't just apply to binomial distributions with fair coins.It applies to almost any distribution that is sampled many, many times,which is a very deep statistical result.I will construct for you the formula that is being used.

07 Quadratics

I will define for you a normal distribution with a specific mean that's often called µ, Greek letter µ,and a variance that's often called σ².We already know that variance is a quadratic expression.In normal land we often use µ and σ². Let's do this.The very first element is that for any outcome x we write the quadratic difference between this outcome x and µ.This is indeed a function in x. So, look at this.Here are four possible hypotheses of what this function might look like.Each case is µ is on the right side some where.The horizontal axis is x, and we're graphing f(x).The first I'll give you is a triangular function.The second is a quadratic function.The third one is a negative quadratic function.And the fourth one is a quadratic function that doesn't quite touch µ.So, which one in your opinion best describes this formula over here?

unnamed (3).jpg

08 Quadratics Solution

I would submit that it's this one over here.The reason is this expression is 0 when x = µ.As a result, it can't be the fourth of these choices.It's strictly non-negative, so it can't go down into the negative area,so this one is being out ruled. It's quadratic.Hence, a function like this doesn't make sense, so it must be this one over here.

09 Quadratics 2

The next thing I'll do is I'll divide it by σ².Without telling you why I'm doing this, I want to see what the effect is.Suppose σ² = 4. That means we have a variance of 4 and a standard deviation of 2. I've given you already the quadratic function when it isn't divided by σ².It's the same as saying σ² = 1.What I'd like to know is whether our new version where σ² = 4 makes this quadratic wider or whether it makes it narrower,assuming that this is our new function f(x).So, pick one of the two, or perhaps it stays the same. Then pick the third.

unnamed (4).jpg

10 Quadratics 2 Solution

The answer is it makes it wider.To see why is this affects the vertical dimension--the output and scales it down by a factor of 4.Whatever we said before gets dragged down by a factor of 4.That means this point over here finds itself here.This one over here finds itself here.That means we'll widen out the quadratic.Observe that large variances yield wide quadratics.Small or tight variances yield sharp quadratics.

11 Quadratics 3

In particular, if we now look at the quadratic over here, which is much tighter,which of the following potential σ² would you this is best representative of this narrow function over here, provided that this is the quadratic that corresponds to σ² = 1. Check one of those four--4, 1, ¼, and 0.

unnamed (5).jpg

12 Quadratics 3 Solution

It follows it's got to be something like a quarter.We already learned that 4 widens the quadratic, so that can't be it.One is already shown over here.Zero makes no sense because we have division by 0 which you can think of as a quadratic that shoots up almost like a straight line,but honestly it doesn't make any sense.A quarter is the one that really describes this particular quadratic the best.Now, that's great. Now we understand this expression over here.

13 Quadratics 4

Let's go further, and let's now take this function and multiply it by -½.Again, I ask you what the affect it.If this is your original quadratic, then what do we get?We already know that it's going to flatten it, because you are dividing the f value by half,but are we going to get something like this or perhaps something like this?Pick one of those two choices.

unnamed (6).jpg

14 Quadratics 4 Solution

And quite interestingly, we inverted the sign,so all of a sudden the function is negative.Quite obviously, green is the correct answer.So, now we have a quadratic that points down into the negative space whose maximum value is 0 and otherwise, it's strictly negative. That's this function f over here.

15 Maximum

And now I'm going to take the most extreme of all steps. I'm going to make this the exponent of the e function.Remember, the inner argument is a quadratic that points down.This a bit does depend on σ. This mean is µ so I call this f(x) where f(x) maximize.And I'll give you several choices for x=µ, x=0, x=-infinity, or x=+infinity.Where will this thing be the largest?

unnamed (7).jpg

16 Maximum Solution

to understand the solution, it's useful to draw the exponential function. e⁰ is 1 and then it goes up exponentially to really large numbers.If you've ever heard Ray Kurzweil talk about the future of society, you've seen these curves--everything goes up exponentially.Everything is just exponential.And further, if you go back in time to negative values,this thing slowly drifts down to 0.Of course, that's not very exciting, so we never talk about exponential in the negative space.However, it turns out that all the arguments of the exponential are at best 0 and otherwise are negative, because the exponential is monotonic--that is the larger its argument, the larger its exponential value.It ought to be optimized where this thing over here is the largest,and where is that the case?Well, it's exactly where µ hits 0.

17 Maximum Value

Let me ask you another question.What is the value of this function if we go to the point where it's maximum,which is x = µ? That's the way to write this.Compute for me in your head this what this thing will be when x = µ.

unnamed (12).jpg

18 Maximum Value Solution

Quite interestingly, even though this formula looks complex, it a really easy answer,which is when x = µ this thing here is 0.That makes the entire thing 0.e⁰--any value to the 0--is going to be 1.

19 Minimum

Next I'd like to know where is f(x) minimized?For what value of x would we get the possible smallest value of this entire expression over here?Again, pick one or more of those choices over here.

unnamed (9).jpg

20 Minimum Solution

The answer is now ± ∞.If you look at this, if you put a really large positive or negative value in,the difference to any µ will be enormous.The square will be even more enormous.Therefore, this entire expression on the right side will be huge.Put a minus sign in front of it, and you have a hugely negative number.You have e^-∞.e^-∞ drive the e curve all the way to the left where this just ends up to be minimized.

21 Minimum Value

In fact, what do you think is the value of this where x = ∞?

unnamed (10).jpg

22 Minimum Value Solution

The answer is 0.As x goes to infinity, this expression goes to negative infinity. The exponential of negative infinity converges to 0.So, now we've got basically the normal distribution function.

23 Normalizer

We have a function f that assumes the value 1 when x = µthat goes to 0 when x goes to ±∞. It so happens that it looks like a bell curve.The fact that it looks like a bell curve is not entirely obvious,but you have to take my word for it.This--what I would consider a relatively simple formula--describes the limit of making infinitely many coin flips.In fact, it describes the limit of computing a mean over any set of experiments.This is a very powerful result.No matter what you do when you drive n to very large numbers you get a bell curve like this.There is one flaw here, and I'll tell you about the flaw without going into detail.That is the area underneath this curve doesn't always add up to 1.In fact, without proof, it adds up to √2πσ².The reason why this matters is deeply buried in probability theory.But it turns out we want all these areas to add up to 1 just as much as we wanted a coin flip and its complement to add up to 1.The true normal distribution is normalized by just the inverse of this thing over here--1/√2πσ².So, that is the normal distribution of any value x indexed by the parameter µ and σ².So, this is a very deep piece of mathematics.Now, we will apply it a little bit for you to practice how the normal distribution looks in the field.


24 Formula Summary

So, here is our normal distribution again.I'm going to write it as "exp" for exponential {-½ (x - µ)²/σ²}.The truth is when you're new to this this looks really cryptic.When you're with statistics for many years as I have been,you wake up in the middle of the night and you can recite this formula.It's as normal as getting breakfast in the morning or having a beer after dinner.What I want to get into your brains is not the complexity of the formula.I want you to really understand how this formula is constructed.I you to understand the quadratic penalty term of deviations from the expectation of the mean of this expression.Then the exponential that squeezes it back into the curves.That's basically what it is.We can draw values from this normal distribution just the same way as we flipped coins before.The way to look at this is any value x has this probability up here.This is nothing else but a notation of the probability of x for a normal distribution with µ and σ².So, a value x that has twice the high bar than some other value x'will have twice as much of a probability of being drawn.Now, obviously, the normal has an entire continuous space of outcomes.And obviously that renders each individual outcome of probability 0.But, in essence, you can think of the height of this thing over here as being proportional to the probability that this value is being drawn.


25 Central Limit Theorem

Let's now talk about central limit theorem.On the left, I'll give you certain things we can do. We can do one coin flip. .We can do many coin flips and compute the mean outcome.That is the ratio of heads to the total number of flips and we can do infinitely many coin flips and once again calculate the mean.The outcomes of those could be any of the following:it could be formula 1, formula 2, or formula 3.These formulas look all very different but the effect really derive from the very simple seen in a coin flip.We have a binomial formula, we have a normal distribution with a specific definition of the mean equals P and the variance equals p(1-p) and we have p itself.Take any of those numbers over here and plot them on the right side to understand which of these expressions corresponds to which of these outcomes.

unnamed (11).jpg

26 Central Limit Theorem Solution

And of course P is seen at coin flip.This expression over here is many but finally many coin flips any coin flips,and this one is the limit of it where we have infinity many coin flips.And what's interesting is that the central limit here and really governs the transition from one coin flip to many many coin flips all the way to infinity many coin flips.The reason why the math looks different is nontrivial and there is a very lengthy prove of the central limit theorem that has spare you.What you should learn is that this exponential function over here captures the distribution of a possible mean.If it transition from a discrete space of finding many outcomes to a space of infinitely many outcomes to a somewhat different property is over define over arbitrary x which is a continuous 20.Whereas this sweet case, but that is the essence of a central limit theorem.Now it turns out this transition to a normal distribution works not just for coin flips but for many other distributions that is outside the scope of this class.

27 Summary

What I've shown you in the beginning of class have from a coin flip to a binomial distribution all the way to a normal distribution, and you might think that this was challenging and indeed it was.As it turns out, you can treat all this things about the same.In fact, if you're a medical doctor and you have one patient, you might think of it as a coin flip.If you have 10 patients, you might think of it as binomial distribution.If you do what's normally done, when test say a new drug and you have, we say 10,000 patients then this thing over here is a beautiful and very compact representation of it.Otherwise, it would be almost impossible to compute.That's the purpose of normal distribution for the sake of this class.As you go forward and look into hypotheses testing and confidence then develops.You don't do this for this relatively complicated expressions over here.We just do it for the normal distribution that I think several could be easy to compute.Welcome to the world of normal distributions.