These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!
Please check the wiki guide for some tips on wiki editing.
So this unit is a tough one.We're going to talk about perhaps the holy grailof probabilistic inference.It's called Bayes Rule.Bayes Rule is based on Reverend Thomas Bayes,who used this principle to infer the existence of God,but in doing so, he created a new family of methods that has vastly influenced artificial intelligence and statistics.So let's dive in!
Let's use the cancer example from my last unit.There's a specific cancer that occurs in 1% of the population,and a test for this cancer and with 90% chance it is positive if they have this cancer, C.That's usually called the sensitivity.But the test sometimes is positive,even if you don't have C.Let's say with another 90% chance it's negative if we don't have C.That's usually called the specificity.So here's my question.Without further symptoms, you take the test,and the test comes back positive.What do you think is the probability of havingthat specific type of cancer?To answer this, let's draw a diagram.Suppose these are all of the people,and some of them, exactly 1%, have cancer.99% is cancer free.We know there's a testthat if you have cancer,correctly diagnose it with 90% chance.So if we draw the area where the test is positive,cancer and test positive,then this area over hereis 90% of the cancer circle.However, this isn't the full truth.The test sent out as positiveeven if the person doesn't have cancer.In fact, in our case,it happened to be in 10% of all cases.So we have to add more area,because as big as 10% of this large areais as big as 10% of this large areawhere the test might go positive,but the person doesn't have cancer.So this blue area is 10% of all the area over hereminus the little small cancer circle.And clearly, all the area outside these circlescorresponds a situation of no cancer,and the test is negative.So let me ask you again.Suppose we have a positive test,what do you think?Would a prior probability of cancerof 1%,a sensitivity and specificity of 90%,Do you think your new chances are now90%or 8%or still just 1%?
Where:means the intersection of the two sets.
And I would argue it's about 8%.In fact, as we see, it will come out at 8 1/3%mathematically.And the way to see this in this diagram isthis is the region that should test as positive.By having a positive test, you know you're in this region,and nothing else matters.You know you're in this circle.But within this circle, the ratio of thecancerous region relative tothe entire region is still pretty small.It increase, obviously, having a positive testchanges your cancer probability,but it only increases by a factor of about 8,as we will see in a second.
So this is the essence of Bayes Rule,which I'll give to you to you in a second.There's some sort of a prior, of which we meanthe probability before you run a test,and then you get some evidence from the test itself.and that all leads you to what's called a posterior probability.Now this is not really a plus operation.In fact, in reality,it's more like a multiplication,but semantically, what Bayes Rule does isit incorporates some evidence fromthe test into your prior probabilityto arrive at a posterior probability.So let's make this specific.In our cancer example, we know that theprior probability of cancer is 0.01,which is the same as 1%.The posterior of the probability of cancer giventhat our test is positive,abbreviate here as positive,is the product of the priortimes our test sensitivity, which iswhat is the chance of a positive resultgiven that I have cancer?And you might remember,this was 0.9, or 90%.Now just to warn you, this isn't quite correct.To make this correct, we also have to computethe posterior with the non cancer option,which there is no cancer given a positive test.And using the prior, we know that P of not C is 0.99.It's minus P of CTimes the probability of getting a positive test resultgiven not C.Realize these 2 equations are the same,but I exchanged C for not C.And this one over here takes a moment to computer.We know that our test gives us a negative resultif it's cancer free, 0.9 chanceAs a result, it gives us a positive resultin the cancer free case, with 10% chance.What's interesting is this is about the correct equationexcept the probabilities don't add up to 1.I'm going to ask you to compute those,so please give me the exact numbersfor the first expressionand the second expression written over hereusing our example up there.
Obviously, P(C) is 0.01 0.9 is 0.009, whereas 0.99 0.1, this guy over here, is 0.099.What we've computed is here is the absolute area in here which is0.009 in the absolute area in here which is 0.099.
The normalization proceeds in two steps.We just normalized these guys to keep ratio the same but make sure they add up to 1.So let's first compute the sum of these two guys. Please let me know what it is.
And, yes, the answer is 0.108.Technically, what this really means is the probability of a positive test result--that's the area in the circle that I just marked.By virtue of what we learned last, it's just the sum of two things over here, which gives us 0.108.
And now finally, we come up with the actual posterior,whereas this one over here is often called the joint probability of two events.And the posterior is obtained by dividing this guy over here with this normalizer.So let's do this over here--let's divide this guy over here by this normalizer to getmy percent distribution of having cancer given that I received the positive test result.So divide this number by this.
The answer is 0.9167 approximately.
Let's do the same for the non-cancer version, pick the numberover here to divide and divide it by this same normalizer.
The answer is 0.9167 approximately.
Why don't you for a second add these two numbers and give me the result?
And the answer is 1 as you will expect.Now this was really challenging. You can see a lot of math in this slide.Let me just go over this again and make it much, much easier for you.
Well, we really said that we had a situation wherethe prior P(C), a test with a certain sensitivity (Pos/C), and a certain specificity (Neg/₇C).When you receive, say, a positive test result, what you do is,you take your prior P(C) you multiply in the probability of this test result, given C,and you multiply in the probability of the test result given (Neg/₇C).So, this is your branch for the consideration that you have cancer.This is your branch for the consideration of no cancer.When you're done with this, you arrive at a numberthat now combines the cancer hypothesis with the test result.Look for the cancer hypothesis and the no cancer hypothesis.Now, what you do, you add those up and then normally don't add up to one.You get a certain quantity which happens to be the total probabilitythat the test is what it was in this case positive.And all you do next is divide or normalize this thing over here bythe sum over here and the same on the right side.The divider is the same for both cases because this is your cancer branch, your non-cancer branch,but this score does not depend on the cancer variable anymore.What you now get out is the desired posterior probability,and those add up to 1 if you did everything correct, as shown over here.This is the algorithm for Bayes Rule.
Now, the same algorithm works if your test says negative.We'll practice this in just 1 second. Suppose your test result says negative.You could still ask the same question:Now, what's my probability having cancer or not? But now all the positives in here become negatives.The sum is the total probability of negative test results, and we may now divide by this score,you now get the posterior probability for cancer and non-cancer assuming you had anegative test result, which of course to be much, much more favorablefor you because none of us wants to have cancer.So, look at this for a while and let's now do the calculation for the negative caseusing the same numbers I gave you before,and with the step by step this time around so it can really guide you through the process.
We begin with our prior probability, our sensitivity and our specifitivity,and I want you to begin by filling in all the missing values.So, there's the probability of no cancer, probability of negative, which is negation of positive,given C, and probability of negative-positive given not C.
And obviously this is times 0.99 as before 0.1 and 0.1. I hope you got this correct.
Now assume the test comes back negative, the same logic applies as before.So please give me the combined probability of cancer given the negative test resultand the combined probability of being cancer-free given the negative test result.
The number here is 0.001 and it's the product of my prior for cancer which is 0.01,and the probability of getting a negative result in the case of cancer which is right over here, 0.1.If I multiply these two things together, I get 0.001. The probability here is 0.891.And when I'm multiplying is the prior probability of not having cancer which is 0.99with the probability of seeing a negative result in the case of not having cancer,and that is the one right over here, 0.9.So, we'll multiply 0.99 with 0.9, I actually get 0.891.
Let's compute the normalizer. You now remember what this was.
And the answer is 0.892. You just add up these two values over here.
And now finally tell me what is posterior probability of cancer given that we know we had anegative test result and the probability of negative cancer given there is a negative test result.Please give me the numbers here.
This is approximately 0.0011, which we get by dividing 0.001 by the normalizer 0.892,and the posterior probability of being cancer-free after the testis approximately 0.9989, and that's obtained by dividing this probability over here by the normalizerand not surprisingly, these two values indeed add up to 1.Now, what's remarkable about this outcome is really what it means.Before the test, we had a 1% chance of having cancer,now, we have about a 0.9% chance of having cancer.So, a cancer probability went down by about a factor of 9.So, the test really helped us gaining confidence that we are cancer-free.Conversely, before we had a 99% chance of being cancer free, now it's 99.89%.So, all the numbers are working exactly how we expect them to work.
Let me now make your life harder.Suppose our probability of a certain other kind of disease is 0.1,so 10% of the population has it.Our test in the positive case is really informative,but there's a 0.5 chance that if I'm cancer-free the test, indeed, says the same thing.So the sensitivity is high, the specificity is lower.And let's start by filling in the first 3 of them.
Obviously, these are just 1 minus those:0.9, 0.1, and 0.5.Notice that these two numbers may very well be different.There is no contradiction here.These guys have to add up to 1, so given ¬C,the probability of positive and negative have to add up to 1,but these guys don't.It takes a lot of practice to understand which numbers have to add up to 1.But I set it up in a way that you should have gotten it right.
Now comes the hard part:What is P(C, Neg)?
And the answer is 0.01.P(C) = 0.1,and P(Neg│C) is also 0.1,so if you multiply those two they are 0.01.
Now let's assume for P(^ÂÂC, Neg).
And the answer is 0.45.P(¬C) is 0.9, and P(Neg│¬C) is 0.5.So 0.9 0.5 = 0.45.
What's the score over here?
Well, you just add up these two numbers to get 0.46.
So tell me what the final two numbers are.
The first one is 0.01 divided by normalized 0.46 and that gives us 0.0217,and the second one is called over here 0.45 divided by 0.46 and that gives us 0.9783and uses the correct posteriors, restarted our chance of 10% of having cancer.We had a negative result. We're down now to about 2% of having cancer.
Let's now consider the case that the test result is positive, and I want you tojust give me the two numbers over here and not the other ones.
So once again, we have 0.9, 0.1, and 0.5 over here.Very quickly multiplying this guy with this girl over here 0.09. This guy with this girl over here 0.45.Adding them up gives us 0.54, and dividing those correspondingly 0.9 divided by 0.54gives us 0.166 and so on and 0.833 and so on for dividing 0.45 by 0.54.And with this means, with the positive test result, our chance of cancer increased from 0.1 to 0.16.Obviously, our chance of having no cancer decreased accordingly.You got this, so let's just summarize.
In Bayes rule, we have a hidden variable we care about--whether they have cancer or not.But we can't measure it directly and instead we have a test.We have a prior of how frequent this variable is true and the test is generally characterizedby how often it says positive when the variable is trueand how often it is negative and the variable is false.Bayes rule takes a prior, multiplies in the measurement,which in this case we assume to be the positive measurement to give us a new variableand does the same for all actual measurement, given the opposite assumption aboutour hidden variable of cancer and that multiplication gives us this guy over here.We add those two things up and then it gives us a new variable and thenwe divide these guys to arrive the best estimate of the hidden variable c given our test result.And this example, I used the positive example is a test resultbut it might do the same with a negative example.This was exactly the same as in our diagram in the beginning.There was a prior of our case, we have this specific variable to be true.We noticed inside this prior, it can cover the region for which our test result applies.We noticed that test result also apply when the condition is not fulfilled.So, this expression over here and this expression over here corresponds exactlyto the red area over here and the green area over here.But then we noticed that these two areas don't add up to 1.The reason is that's lots of stuff outside, so we calculated the total areawhich was this expression over here, pPos.And then we normalized these two things over here by the total area to get the relative areathat is assigned the red thing versus the green thing and at this time by just dividing by the total areain this region over here; thereby, getting rid of any of the other cases.
Now, I should say if we got this, you don't find any immediatesignificant about statistics and probability.This is totally nontrivial, but it comes in very handy.So, I'm going to practice this with you using a second example. In this case, you are a robot.This robot lives in a world of exactly two places. There is a red place and a green place, R and G.Now, I say initially, this robot has no clue where it is,so the prior probability for either place, red or green, is 0.5.It also has a sensor as it can see through its eyes, but his sensor seems to be somewhat unreliable.So, the probability of seeing red at the red grid cell is 0.8,and the probability of seeing green at the green cell is also 0.8.Now, I suppose the robot sees red.What are now the posterior probabilities that the robot is at the red cell given that it just saw redand conversely what's the probability that it's at the green cell even though it saw red.Now, you can apply Bayes Rule and figure that out.
In this example, it gives us funny numbers. It was 3 for red as 0.8 and one for the green as 0.2.And it's all to do with the fact that in the beginning where there had no clue where it is.The joint for red after seeing red is 0.4. The same for green is 0.1. 0.4+0.1, S to 0.5.If you normalized 0.4 divided by 0.5, you get 0.8, and if you normalized 0.1 by 0.5, you get 0.2.
If I now change some parameters--say the robot knows the probability that it's red,and therefore, the probability 1 is under the green cell as a prior.Please calculate once again using Bayes rule these posteriors.I have to warn you--this is a bit of a tricky case.
And the answer is,the prior isn't affected by the measurement,so the probability of 0 is at red,and the probability of 1 at green,despite the fact that it's all red.To see this, you find the joint of seeing it red and seeing redis 0 times 0.8, that's 0.That's the same join for green is1 times 0.2.So you have to normalize 0 and 0.2.The sum of those is 0.2.So let's divide 0 by 0.2, gives us 0,and 0.2 divided by 0.2 gives us 1.These are exactly the numbers over here.
To change this example even further.Let's make this over here a 0.5and revert back to a uniform prior.Please go ahead and calculate the posterior probability.
Now the answer is about 0.615or 0.385.These are approximate.Once again, 0.5 times 0.8 is 0.4.0.5 minus this guy is again 0.5.0.25, add those up,0.65,normalizing 0.4 divided by 0.65gives approximately 0.615.0.25 divided by 0.65 is approximately 0.385,so now you you've got it.
I will now make a V heart.Suppose there are 3 places in the world, not just 2.There are 1 red one and 2 green ones.And for simplicity, we'll call them A, B, and C.Let's assume that all of themhave the same prior probability of 1/3 or 0.333, so on.Let's say the robot sees red,and as before, the probability of seeing redin Cell A is 0.9.The probability of seeing green in Cell B0.9.Probability of seeing green in Cell Cis also 0.9.So what I've changed is,I've given the hidden variable,kind of like the cancer/non cancer variable,3 states.There's not just 2 as before, A or B.It's now A, B, or C.Let's solve this problem together,because it follows exactly the same recipe as before,even though it might not be obvious.So let me ask you,what is the joint of being in Cell Aafter having seen the red color?This is the joint as before.
And just like before, we multiplythe prior, this guy over here,that gives you 0.3.
What's the joined for Cell B?
Well, the answer is you multiply our prior of 1/3probability of seeing red in Cell B,as seeing green at 0.9 probability,so red is 0.1.So 0.1 times this guy over heregives 0.033.
Finally, probability of C and Red.What is that?
And the answer is exactly the same as this over here,because the prior is the same for B and C,and those probabilities are the same for B and C,so they should be exactly the same.
So here's the $100,000 question.What is our normalizer?
And the answer is, you just add those up.
Now we calculate the probabilityfor all 3 possible outcomes.So please plug them in over here.
As usual, we divide this guy over here,the normalizer,which gives us 0.818.Realize all these numbers are a little bit approximate here.Same for this guy, approximately 0.091.And this is completely symmetrical, 0.091.And surprise, these guys all add up to 1.
So what have you learned?In Bayes Rule, there will be more than just 2underlying causes of cancer/non cancer.There might be 3, 4, or 5, any number.We can apply exactly the same math,but we have to keep track of more values.In fact, the robot might also have more thanjust 2 test outcomes.Here was red or green, but it could be red, green, or blue.And this means that our measurement probability will be more elaborate.I have to give you more information,but the math remains exactly the same.We can now deal with very large problemsthat have many possible hidden causesof where the world might be,and we can still apply Bayes Rule to find all of these numbers.Let me give you one final test.
This test is actually directly taken from a life and you'll smile when you see my problem.I used to travel a lot. It was so bad for a while.I would find myself in a bed not knowing what country I'm in. I kid you not.So let's say, I'm gone 60% of my time and I'm at home only 40% of my time.Now at summer, I live in California and it truly doesn't rain in the summer.Whereas in many of the countries I have traveled to, there's a much higher chance of rain.So let's now say, I lie in my bed, here I am lying in bed, and I wake upand I open the window and I see it's raining.Let's now apply Bayes rule--What do you think is the probability I'm homenow that I see it's raining--just give me this one number.
And I get 0.0217, it is a really small thing.And the way I get there is what taking home times the probability of rain at homenormalizing it using the same number of a year plus the calculation for the sameprobability of being gone is 0.6 times the rain I've been gone has a probability of 0.3and that results is 0.0217 or the better of 2%--did you get this?If so, you now understand something that's really interesting--here you goto look at a hidden variable, understands how a test can give you information backabout this hidden variable and that's really cool because it allows you to apply the same schemeto great many practical problems in the world--congratulations!In our next unit, which is optional, I like you to program all of this so you can try the same thingin an actual program interface and writes software that implements things such as Bayes rule.But not to worry, this is optional. If you don't know how to program just give skip the next unit.