These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!
Please check the wiki guide for some tips on wiki editing.
Today, we'll be turning things upside down and apply what we learned so far about estimation and probability to solve some really basic statistical problems that occur everyday.Today, we talk about confidence intervals and in the next class we talk about testing hypothesis.These are fundamental concepts that every applied statistician has to used pretty much everyday.You'd better listen. This is my first example.Imagine it is just about election day, and I know not all countries have election days but I wish they did and I wish they were fair.In the United States, they have 700 million people that are eligible to vote over those a section go and vote, but as a transition you often wish to understand what the outcome of election day is before the actual vote happens.One thing you could is to ask every voter and then just report the results. Is this a good idea?
And of course not--the amount of effort involved from asking every voteris paramount to running the entire election itself, which in this small group of people maybe five or ten is feasible but in a group of hundreds of millions of people isn't feasible.
Here's what statistician do. They choose a random sample.The sample is the selection of hopefully random withdrawn people from this pool that are representative of the pool at large.In this sample, they might derive an estimate on how people are likely to vote.They might just report that as estimate,but in practice, what they do is they report that the estimate plus a margin of error.In this margin of error, means what's really been given back is not singular number like 60%,but an interval--we call this the confidence interval.Look at these numbers for the blue party or party A and tell me what do you think is the lower bound and upper bound in the conference interval in percent.
And the answer is simple. You take 60%.You subtract 3 to get 57 as the lower bound and 63 as the upper bound.The confidence interval of 57-63 means we are fairly confident that given our current sample,election day will give party A an outcome within this range.
This is a very fundamental concept in statistics.It applies to coin flips and many of the other examples we discussed before.A candidate in election might have a true chance that any given voter votes for him.Let's call this chance P, the same as the coin flip, and of course if P larger than 0.5 in a 2% run off that person will not initially win in most cases, not always, but as a statistician we can't assess the two chances, so what we do is to form a sample.In coin flipping, we flip n coins. In elections, we ask n randomly chosen people.As we know, this gives us an estimated mean.It gives us also a variance and therefore, standard deviation.What's new now is the confidence interval, and the confidence interval is not the same as the variance.In fact, drop the consideration of a variance.What this says is based on the outcome, we believe for the parameter µ.This is the best guess we can get, which is usually maximum value estimate,but we are not quite certain, so you're going to asses a wider range,in which we believe with high probability that whatever the outcome will be that the outcome will be high probability within this range.Very often this range is defined as 95% chance that the final outcome falls into this range.That will take some work to compute. So let's dive in.
Let's say it's election day coming up again.In this example, I would like you to get an intuition how confidence intervals behave.Here's again all the voters and there are multiple institutions.Institution 1, Institution 2, and Institution 3 that all sample the voters to derive a prediction of what the outcome of election day is.For simplicity, there are only 2 parties A and B and say institution 1 finds in a sample of 20 with 11 say A and 9 say B. What do you think the maximum of your estimate is going to be base on the information?I'm asking here for the probability that party A wins.
And the answer is 0.55.
Now let's say Institution B has the same probability, just more rigorous,and it asks 5 times as many people.So 55 say A and 45 say B.Once again, what is the probability for A using the maximum likelihood estimator?
And the answer is 0.55 as before.
But there is a difference. Institute 1 used many fewer data points than Institute 2.In fact, if someone when all the way to ask 2000 potential voters,yet derived the exact same 0.55 then all the predictions look alike,but if you were to trust one of the three the most, which one would you choose?Choose exactly one of the three.
And I would say it's the third one because they asked the most people.
To see, let's make an extreme example of an institute that only asked one person. Now, it's kind of chance. A is a bit more likely than B.They will be forced to say that all voters will either vote A or all voters would vote B,depending on the outcome of this one question.In this case because they encountered someone saying A,the probability for anybody else voting A will be 1.Would you trust this company that only asked one person? Of course not.The more data that's sampled, assuming that the sample is fair and independent, the more trust you have,and more trust means a smaller confidence interval.
Suppose you beta sample and we now increase the sample size N, will the size after confidence interval grow, shrink or stay the same?Pick one of the three.
And the answer is it will shrink. The confidence interval become smaller.
If you have a probability here you're trying to estimate--and here is the empirical mean--with few data points--you might have a very right confidence interval--but as data increases,the boundary of the confidence interval will move inward.Let me contrast to this with a concept that we encountered previously,which is the standard deviation.Suppose you increase the sample size N, how would you expect the standard deviation of the sample to change--will it grow, shrink or stay about the same?
And here the answer is still about the same.
It turns out that standard deviation of distribution is not dependent on the sample, so whether you throw a coin 100 times or a 1000 times when you graph the outcomes the standard deviation is not affected by the sample size but the confidence interval is.That's a really important difference.If there's one thing you want to take home from this class that is that as you get more data as a statistician,your confidence increases--so your confidence interval shrinks,whereas of course more data doesn't change the standard deviation.In fact, if the standard deviation is σ, then the confidence interval is effectively 1/√Nσ.This is proportional, depends on number of other factors. There could be a constant in front of it.But there is an interesting relationship.As you get more data, the standard deviation will become more accurate,but it's not going to change that much.Your confidence will shrink, and we're going to look into this a more detail.
Let's go back to probability and go back to our simplest of all examples--that of a coin flip. Assume that p is the probability that the coin comes up heads and assume we flip the coin N times--that is the same as saying that a sample size of N.It gives us N outcomes and in the past, we computed the empirical mean, the variance,but now I will let you compute the confidence interval.Let's start--I'm going to ask you a series of questions, most of them are fairly challenging,and at the end you'll be able to compute the confidence interval for a sample size N of coin flips.Let me begin. What is the expected mean outcome when you flip this coin?Suppose one is heads, 0 is tails--either comes up, it's is 0.5 chance.This could be a number where the number scales between 0 and 1.It would be 0 if the expected outcome is always tails and be 1 if the expected outcome is always heads.
And the answer is 0.5.To see, we find that the probability p was 0.5 we get an outcome of 1, heads,and with probability of 1-p, again 0.5, we get an outcome of tails. Adding this up gives us 0.5.
A more tricky question--what do you think for this coin flip is the variance of the outcome? That's the question you never really asked, and it's okay to get it wrong or punt on it,but perhaps you can give it a try.The math is very similar to the math that I provided here.
Well, you know that the variance is defined as the quadratic difference of the actual outcomes from the mean.There's two possible actual outcomes--there's outcome 1and outcome 0, both have probability of 0.5.If 1 is the outcome, here is our difference from the mean--we square this.If 0 is the outcome, then this is our deviation from the mean.This guy is about 0.25 after I'm planning to square, so we get in total 0.25.You should check if the math is correct.
This is all basic probability and doesn't get us into confidence intervals, but let's do it now.Let's say we have a sample of size 1, 2 and 10.We know that the variance of each of those individual coin flips is 0.25,but I want to add up all the outcomes.What is the mean when I just sum up the outcomes--I've not yet computed the average--and what is the variance when I sum up all the outcomes for all the samples?Let's do the means first.
Hopefully, you guessed this correct. For a single coin flip, it's 0.5 as written up here.For 2 it's a 1. So in average, the outcome may be 1 and for 10, it'll be 5.
Let's now do the variances. Fill in those three numbers.
Let's now do the variances. The first one we can copy over, but what is the second number? Well, you might vaguely remember that the sum of two coin flips or sum of any two of anythings you do, the variances just add up.With this formula in mind, we find that the variance of two coin flips is 0.5,and the variance of the sum of 10 coin flips which is 2.5, which is 10 times the number over here.
And now I'm going to be really demanding.I would like to know what is the variance of the mean, not the sum, but the mean.I'm going to put 1/N over here--the same sum as before.This is what you do, for example, when you compute the mean outcome of an election from a sample of size N. You take the sum of the things you saw and you compute the mean.That's the formula of the mean, but now I'm asking what's the variance of the mean.So I'm not asking for this one over here--we asking for this one over here.And as a hint, at some point we talked about something like the following--what if you multiple a variable by a constant A, what happens to its variance?It was something times variance X.Let's do the first. What goes in there.
Well, this one is surprisingly easy because N=1, so this thing falls out so it's the same as this thing over here--0.25.
The second gets the higher one. Give it a try. It's okay if you fail.
And this is the one thing that's really important--variance is a quadratic expression. A final multiplier of the outcome goes out quadratically, and we talked about this,you might have forgotten it--it's not important.I think now that you see it again, you might hopefully remember it.That means if you multiply 1/N inside the variance, the 1/N² outside,N²=4 in this case, 0.5/4 is 0.125
This is logic. I hope you'll be able to do the fifth one over here.
The answer will be 0.025. If you look at this, the variance of the sum increased with N.But now we're dividing 2.5 by 100, 10²--so 2.5/100 gives 0.025,and there's really an interesting insight here, which is the spread of the mean--what you compute from the sample--goes down as N increases.For small N, it's 0.25, but for larger N like N=10, it's now 0.025.
We are now two steps away from the confidence interval. We're almost there. The first thing we do is we go back from the quadratic expression down to the non-quadratic expression--the standard deviation that corresponds to the variance over here.You're ready now to compute this. Just give me those three values over here.
0.5 is the square root of 0.25, 0.3536 is the square root over here,and the variance over here is approximately 0.16.
And now we get to the confidence interval.And without further ado just multiply this value over here with 1.96.The magic number. That should give us a size of the confidence interval.
Yes the first one is 0.98, 0.69, and 0.31.
Now I should argue this 1.96 trick isn't mathematically correct for very small sample sizes. It usually assumes we have a least 30 samples, but let's just ignore this for a second.What you've done is for the coin flip example,you've actually computed confidence intervals yourself.That's a nontrivial thing to do, and because we like this so much, let's assume that we have a sample size of a 100 and give me all of those numbers over here.
Now the mean is obviously 50, 0.5100, the variance will be 25, 1000.25.But as we divide the variance by now 10,000, we get 0.0025.The square root of that happens to be 0.05 and multiplying with this magic number 1.96 gives us 0.098.The way we read this is if after a 100 coin flips we observed, perhaps empirically,that the probability of heads is 50%, there is still an approximate ±10% confidence interval.So we are really certain it'll fall between 40% and 60%, but in between we're not that certain.
Let's pick a second exercise, we have now a loaded coin that mostly comes up tails with 0.1 chance it comes up heads.I gave you the formulas for how to compute the mean of the outcome.The mean is just the probability and the variance.This comes without proof, but in the next optional units, we'll prove this expression together.
We studied the special case of a fair coin, p=0.5,and now I want to do the same thing for our arbitrary p.It turns out that the real challenge is not to compute the confidence interval.The real challenge is to compute the variance of a coin flip where the outcome is p.In particular, let's do a quick check. Suppose p=0, what do you think the variance should be?Do this by thinking principally not by trying to solve complex mathematical equations.
And of course the answers is 0, if p=0, the coin comes up tail all the time and as a result it won't vary, therefore there is no variance.The same as to for p=1. Once again we know the variance is 0.So if we were to draw the variance as a function of p, we already know it's 0.25 when the count is most random, and it goes down in someway all the way to 0.
Here's a game I would like to play with you.I drew on the right side six different expressions, and I'm going to derive this for you,leaving out those six expressions and whenever I stop, you have to pick one on the right side,Click the appropriate radio button--or maybe two of them--and if they're correct, we'll move on.To calculate the variance of x, we first notice that p is the mean of x,and if you look at the two possible outcomes starting with heads and then tails,you find that the probability for heads is p and if heads is chosen then the variance is the quadratic difference of the actual outcome,which is 1 for heads minus the mean which equals p.Complete the same expression for the tails case.What's the probability of tails, and if tails actually occurs, the outcome is 0.What will be the quadratic deviation we putt into this formula over here?
And this is the correct one, tails comes up with probability 1 - p, and the difference between 0, the outcome, and the actual mean is p,and we have to square it for the variance.So I take this expression away.
We multiply this out and then we're going to leave certain expressions open. In particular, I'll multiply this square root expression first but leave the center expression open,and on the right side, only the open term--so two of those fit exactly in here.Please click the two appropriate boxes.
And obviously this is -2p--the guy over here. So let me put this in.And this one is here -p³ which sits over here. So let me just put this in and take those options away.
The next step is now straightforward--we multiply this guy out over here p - 2p² + p³ leaving this guy over here intact.And that simplifies to one of the expressions on the right side.
Obviously the p³ cancel out plus -2p² over here plus 1p² over here,which makes -1p², and we can leave the p intact--we get the expression on the bottom.We can factor p out to get p(1 - p). That's the variance of X for arbitrary p.
Let's do another exercise. I'll give you a specific p. Calculate for me the variance.
That was p(1-p). Obviously, that's 0.09.
For different values of N, I'd like to know what is the width of the confidence interval using the same formula we used before.
Let's start with N=1 and this involve different steps, you compute the standard deviation as the square root of the variance you divide it by the square root of N and multiply with this funny factor 1.96--p(N-p) is 0.09--the square root is 0.3.N=1 so it's 0.31.96 reveals a 0.588.
I trust that you'll probably do the same now for N = 10 and N = 100 using this exact formula that just the same that we've done before.
0.186 for N=10 and not surprisingly for 100, it'd be exactly a factor of 10--smaller than the confidence interval over here.
Now that's really cool which means for any coin flip in any sample size,you can compute for me how confident you are.If in the coin flip, you came back to saying, wow I believe it's the 0.44,you can now say, well, given the number of same as the head, it's not 0.44. It's a confidence interval.It's a certain way of maybe ±0.07 that gets added to the 0.44 that they can be confident about according to our method study so far.In practice, in which you were given a sample and you don't know the exact value of p.Suppose this is the sample you observed, N=4 and we're going to apply the formula even though we understand it should only apply when N is equal or larger than 30.Just for the exercise of applying the formula, compute for me the mean and the variance.The variance, I don't need the formula variance but the variance,I mean the actual variance from the data sequence.
Obviously, the mean is 0.75 and for the variance, you subtract 0.75 from the data sequence and you square it and so on.You get those results, compute the mean and I hope it also gives us 0.1875.
You know a magic formula we can compute the ± term of the confidence level.And here's how it looks like, where we've replaced the analytic variance by the empirical variance over here, but otherwise, it's the same as before.I just calculated this, and here's what I get.For this sample, that means, we would say between the probability 0 and 1,we believe most likely the coin was biased and has a 0.754 probability.We are quite sure we have a confidence interval that's 0.42 wide on both sides.In fact, it reaches beyond 1, which is a funny side effect of this in the normal assumption.This is the interval in which we are uncertain about what the correct value of the coin is.In fact, this one does include 0.5--it could be that this coin is perfectly unbiased.0.75-0.42 is much smaller than 0.5. Obviously, it's a small sample.Let's do the same thing for a larger sample, and this one is a quiz for you.Here's the sample. I've ordered it. There's three tails and seven heads.I want you to just compute for me the mean plus/minus this confidence interval with the term.Please give me both numbers.
Now for the mean is 0.7 and we get 0.28 for the term in the right.In my calculation, σ² is 0.21 and when I apply my magic formula,plugging in N=10, I get this guy right over here.What's interesting is this CI term is now smaller because they have a larger sample.
Let me give you one last quiz. Now, we have here 400 times tail gives us 600 times heads.We flip the coin a 1000 times which is a lot.Let's observe what happens to the mean and our much beloved width of the confidence interval and just follow this--use our formula again.
As easily seen, the mean is 0.6. Hope you got that right.The variance ends at 0.24, surprisingly it's 0.61-0.6and determine the right here is 0.0304 in approximation.And that means with a thousand samples, we are really confident that the mean is 0.6 and we are only willing to consider a deviation of 0.03.Now I ask you, do you think this gets evidence that this is a fair coin?You'd say absolutely no because the confidence interval is so small.This is where the square root of N really helps us really drives the size of the confidence interval down.The more the sample size, the better for us. The more confident we are in our result.
So congratulations!You've learned about confidence intervals and this is a really cool tool to use when making conclusions from sample data.It's much better to say here's the mean and here is how confident I am about indicating the interval than just giving the mean, and you learned that the interval size shrinks as the sample size increases--so eventually, if you have lots of data, we get more and more confident.