These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!
Please check the wiki guide for some tips on wiki editing.
So now we talk about one of my favorite topics hypothesis testing. And the reason why I love this topic is because it's really about decision making.We talked a lot about data or often called the sample and we used statistics to extract information and now we use more statistics to make decisions and the decisions to be binary, be it a yes or no.And the reason why I like this, I think life is all about decision making.I think there is no meaning in doing statistics without eventually at least making a decision.This is the first time in this class, you're going to make an actual decision.So in this box of pills for weight loss and particularly it says that 90% probability,taking these pills, a substantial weight loss is being guaranteed.Now suppose your job is not to make a decision whether to buy this box or not.I can promise you in all likelihood why those pills don't work but whether this claim is correct or not so you really care about why the company that sells this product made a correct thing.And to do so, you talk to people who have taken this medicine and you ask question--did you lose weight?Yes or no and say this is a magical medication.And 11 out of 15 people tell you yes, but 4 out 15 people tell you no.Obviously, this is not 90%. In fact, what do you think it is in percent?
It is only 73.3, which is 11/15. But that could just be a sample error. It could actually be 90% as advertised and you were just unlucky, but we believe that high probability of this is correct or is the data actually contradicting this claim and you should complain that the claim is incorrect.That's the essence of hypothesis testing. We make decisions whether a hypothesis is correct.
So this is the hypothesis. It's often called the null hypothesis or H₀. Null is kind of a funny word but that's what people commonly call it.And there's a contrary hypothesis that evaluates this hypothesis--we'll just call it the alternate hypothesis or H₁.So on the right side, I'm going to write examples for what this hypothesis could mean in this case.So here are five different choices and if you listen carefully, can you tell me which of these five choices and there's exactly one would correspond to the null hypothesis that this medication works exactly as advertised where p is the probability of losing weight.
And yes, this is the one.
Now, the tricky question is--what do you think is the alternate hypothesis if I suspect that the medication doesn't work as well as advertised in terms of losing weight.It's a bit of a trick question because we haven't talked about it yet, so don't worry if we don't understand if I get it wrong.
I would say your counter hypothesis--the alternate hypothesis is p is smaller than 0.09,and the reason why is I said the counter hypothesis if the medication doesn't work as well as advertised.If I'd said it doesn't work as advertised and dropped the words as well, then this might be the correct counter hypothesis, but those are hard as you analyze.In reality, you often have a suspicion that you're in a specific side of the null hypothesis.So this is the one to pick over here.
So now, we have two hypotheses and we'd like to test them.The null hypothesis is the basic hypothesis that something is correct and unless proven wrong, we always believe null hypothesis is actually correct,we trust the manufacturer or do we have sufficient evidence to be very high likelihood the null hypothesis is wrong, the alternate hypothesis is correct then,we reject the null hypothesis and we tell the manufacturer,please print the label on the box because the number there isn't quite correct.So let's do it. To recap, we have sample 11 yeses and 4 nos.And our null hypothesis is equals p=0.9 and our alternate hypothesis, if it doesn't work,as well as the null hypothesis, so p is smaller than 0.9.This is a very exact to the setup you're going to study when we make hypothesis test.If you look at the sample and it's obvious this is binomial which is the fancy word for coin flips, binary experiments.With 15 different experiments and under the null hypothesis probably goes 0.9.So obviously, in this binomial distribution, there are about 16 different outcomes from 0 to 15.The most likely is 13, 14 is also pretty likely if the null hypothesis is correct,and then the probability goes back to be down.The key is assume that H₀ is actually correct, so the probability is 0.9,then FIND was called the critical region under which 5% or less of the total probability of this binomial designs for different lane.You will place a mark over here that separates outcomes that invalidate the hypothesis and outcomes that validate the hypothesis, so that all the outcomes, together, left off this marker collect in totality at most the confidence level, in this case 5%.So that if you find an outcome that sits to the left of this point,you can say wow this outcome was so insanely unlikely under the null hypothesis, I just don't buy it.If conversely the outcome is somewhere in this green box and it isn't that unlikely after all.Realize that each outcome itself might have a small likelihood,but you don't care about the probability of the outcome itself.Even the maximum of the outcome has a small probability in this model.All you care about is putting a separator within and except this region and what's called the critical region and again the logic is the critical region collects many possible outcomes, but in totality,these probabilities don't exceed the 5%.So if the final outcome is in the critical region, you can confidently say,wow, I'm really surprised I don't buy that H₀ is correct.
Let me write down for any possible outcome of the binomial, the corresponding probability.So just for our little Python program implementing the familiar expression of binomial distribution and with this program, I'll get the following.Practically speaking, all the first ones have zero probability.This is about something times 10 to the minus 5 and then over here, very small probability has come up,and they go all the way to 0.34 for 14, somewhere between 13 and 14 is the most likely outcome,but even those numbers aren't huge because there are so many of those up to 15.So you've build these tables before, I spared to the work this time around and the very basic logic here is to not draw a line such that the total probability contained on top of the line and it's this, doesn't exceed 5%.So here's a question for you, where would you draw the line? If you draw it here, for example, then give me the number just on top of the line, 3.If you're going to draw over here, it will be 13. Neither of those are correct.Draw the line where the probability contained above the line doesn't exceed 5%.
And the correct answer is after 10--the sum of all these things over here up to the 10th are 0.012 approximately.There's a slight truncation issue here that gets some work with number and is far away from 5%,which will be 0.05, but if you're now were to add this guy over here, you'll go beyond 0.05.
So this is where we draw the line, any outcome over here is critical. That means so any of those outcomes, you'll be so surprised that we reject the null hypothesis.Whereas any outcome in the region over here is okay, so we accept it.Notice subtlety here. Even outcome 11 is okay.Despite the fact that 11 is very surprising, it comes to the probability less than 5%,but that's not the way to look at this.The more frequently your sample, the more unlikely each individual outcome is.Look at the outcome as blocks.When you implement hypothesis testing, there'll be something like a most likely outcome.Somewhere between 13 and 14, I guess it's 14. That outcome has to be okay obviously.If you see that, you'll be happy and then you push this in one direction towards H₁,the alternate hypothesis until all the probabilities remaining cover 5% or less and that's the critical region.So let's give this another try. Last Saturday, I went to the magic store and bought a loaded coin.I paid a lot of money for it and the wizard that sold me the coin told me the probability of head is equals 0.3.In fact, there are many buckets of coins, the fair ones the slightly loaded, all the way to the fully loaded, and those guys are cheap and those guys are cheap but this one over here was really expensive.So I wanted to make sure that I really got a loaded coin.Let's first ask, what's our null hypothesis. Here are our choices for the probability of heads.Pick the one that's best describes the null hypothesis.
And yes it was a difficult question--if at all it confused you by the frequent occurrence of 0.7 which makes no sense whatsoever.This is the correct one, p=0.03.
Now my suspicion is they just sold me a fair coin or something closer to a fair coin--that's a one-sided suspicion.Which of those best describes the alternative hypothesis? Which one?Check one of the remaining seven boxes.
And in this case which showed p>0.3--that means it's more like a fair coin than this one over here.It could be unfair in the other direction. This test just short for the number over here is correct.
So now I do my sample--tails, heads, and I get the following 11 outcomes. What is the value for p?
Is 0.45 which is 5/11.
So again now the question with 5% confidence, tell me should we return the coin because the given one is more fair than the one I wanted or should I accept it.To make it a little easier, I will calculate for you the binomial probabilities and of the null hypothesis 0.3 for the 12 possible outcomes from 0 heads all the way to 11 heads.So here are the approximate values in this table and before you make your decision,let me ask you a couple of easier questions--what is the most likely outcome for 11 coin flips assuming that the H₀ is correct and the answer is 3.
Technically, 0.3 times 11 is 3.3, but 3.3 cannot be the outcome--it has to be an integer number.3 is the highest probability as seen in the table over here.
Next question. The critical region--is it to the left or to the right of 3 in this table?
And the answer is to the right.We're not asking the question of whether the coin unequals 0.3, at least not for now.
So now the $100 question--what is the smallest outcome in the critical region?Put your answer right here.
The answer is 7. These probabilities up here are smaller than 0.05. If you go to move to the left, you crossed the 5% confidence boundary.
I'm going to ask you, should you return the coin. Yes or no.
I would argue you're safe. There are five heads in the sequence. That's your actual outcome. It's well within the safe region. We'd only return it if we seen heads seven times or more.
Now, let's look at one more example and this time we go to a bank and hope to get a fair coin This going coin over here and our alternate hypothesis is a two-sided hypothesis with a probability either is smaller than 0.5 or larger than 0.5 which provide us p≠0.5.The way to know where this concept will lead, in all we've done so far,we assume that H₀ is correct and we've computed some sort of distribution and then we cut out a critical region such as the volume underneath with smaller input of 5%,assuming you a 5% confidence.Now in the two-sided test, what we do is cover the smaller region on the left but also one on the right,such that the area on the left doesn't exceed half of 5%that is 2.5% and the same for the area on the right.Now, we've moved the critical region into two areas.We have a two-sided test now that's called a two-tailed test and in totality, the two critical regions on the left and right don't exceed 5%.This looks awfully like a confidence interval, right?Let's flip the coin. In 14 experiments, heads comes out exactly 3 times and 11 times it comes out tails.Let's do the analysis. So in this table, I've graphed to you the probabilities under the binomial distribution for each possible outcome, from 0 to 14 as before.We go 0, 0.005, 0.22, 0.06, 0.12, 0.18, 0.21--this is obviously the most likely outcome for the null hypothesis and then it goes down exactly the way it went up over here. I've also added check boxes. I want you to check exactly those that define the critical region,so the total probability in the critical region does not exceed 5% and remember this is a two-tailed test.
This one was tricky--you're going to count start on the left and on the right and these points over here defined the critical region.If you were to move it inbound, you'd find that 0.022 plus 0.005 becomes larger than 0.025 with just half of the confidence that you have to exclude over here.Now since we know that these must smaller or equal to 2.5% each,we have to exclude 3 and 11 from the critical region,and 3 shall be fine for these two sample tests.
You only worry about the following--is p smaller than 0.5 as the alternate hypothesis?Then these shortest times over here would now define the critical region?
And here is my answer--the sum of those guys over here is about 0.287 and that's smaller than our 5%,and in this case, you would reject the number of heads equals 3, but what about the coin.I often don't have a one-sided hypothesis--I have a two-sided alternate hypothesis.
The instructor made a mistake regarding the calculation of the width of the critical region: It should be 0.0287 (0.027 if you use the rounded values from the table), not 0.287:
Now you're going to put everything together for me, and I'll give you a new problem,and you'll answer all the questions for me to demonstrate to me that you fully understood what I'm talking about here.A treatment against cancer is advertised by the manufacturer to work in 80% of all cases.Your suspicion is that the drug doesn't work as well as advertised.Suppose 10 patients have been treated for this drug,what's the most likely outcome, so to speak, of healthy people? That's a simple question.
And obviously the answer is 8. 8 out of 10 is 80%.
Now, we suspect the drug doesn't work as well as advertised, here is the hard question.Where does the credibility region end?Put differently, what is the largest number of healthy people in the critical region or that define the critical region?What's the largest number of healthy people out of the 10 where you'd be suspicious and you wouldn't be 95% confident using our 95/5% confidence threshold.You'd really reject the claim made by the manufacturer of this drug.
The first thing we do is we notice it's a one-sided test--that is, if this drug was even better than advertised we're not really that concerned, to be honest.Then we graph the binomial just like before.Here are the outcomes and here are the probabilities.Obviously, 8 is the most likely outcome, assuming the null hypothesis is correct.The critical region is not defined from the left side,which is fewer people are healthy and the treatment doesn't work as well as advertised.Let's add up until we reach 0.05, which is right over here.The sum happens to be 0.032, which is below 5%. So 5 is the correct answer over here.
Now in our final question, I now give you the observed outcomes.Five people were healthy and five remained sick without a change of the cancer in the N = 10 experiments.My question now: do you reject the claim that this treatment will heal 80% of the people?Yes or no?
And the answer is yes.Five healthy--that's the 5 to look at, not the other one--falls into the critical region.In fact, it's included in the critical region that goes from 0 all the way to 5, and therefore we reject this claim over here at a 95% confidence level or 5% error probability.
This is cool! You made it! You now understand the very basics of hypothesis testing.We talked about critical regions. We talked about null hypothesis.We defined the critical region in the one-sided test and in the two-sided test.And assuming that the null hypothesis was correct is the actual outcome fell into the critical region, we become suspicious and reject the hypothesis.Whereas, if it was 95% consistent with the null hypothesis,we'd be okay and that's the essence of hypothesis testing.Next time we'll tie this concept together with confidence intervals and some of the more advanced statistic concepts we've learned when sampling from data.