st101 »

**These are draft notes from subtitles. Please help to improve them. Thank you!**

Contents

- 1 ST101 Unit 2
- 2 Probability
- 2.1 Flipping Coins
- 2.2 Flipping Coins Solution
- 2.3 Fair Coin
- 2.4 Fair Coin Solution
- 2.5 Loaded Coin 1
- 2.6 Loaded Coin 1 Solution
- 2.7 Loaded Coin 2
- 2.8 Loaded Coin 2 Solution
- 2.9 Loaded Coin 3
- 2.10 Loaded Coin 3 Solution
- 2.11 Complementary Outcomes
- 2.12 Two Flips 1
- 2.13 Two Flips 1 Solution
- 2.14 Two Flips 2
- 2.15 Two Flips 2 Solution
- 2.16 Two Flips 3
- 2.17 Two Flips 3 Solution
- 2.18 Two Flips 4
- 2.19 Two Flips 4 Solution
- 2.20 Two Flips 5
- 2.21 Two Flips 5 Solution
- 2.22 One Head 1
- 2.23 One Head 1 Solution
- 2.24 One Head 2

- 3 Conditional Probability
- 3.1 Cancer Example 1 Solution
- 3.2 Dependent Things
- 3.3 Cancer Example 2
- 3.4 Cancer Example 2 Solution
- 3.5 Cancer Example 3
- 3.6 Cancer Example 3 Solution
- 3.7 Cancer Example 4 Solution
- 3.8 Cancer Example 5
- 3.9 Cancer Example 5 Solution
- 3.10 Cancer Example 6
- 3.11 Cancer Example 6 Solution
- 3.12 Cancer Example 4
- 3.13 Cancer Example 8
- 3.14 Cancer Example 7
- 3.15 Cancer Example 7 Solution
- 3.16 Cancer Example 8 Solution
- 3.17 Total Probability
- 3.18 Two Coins 1
- 3.19 Two Coins 1 Solution
- 3.20 Two Coins 2
- 3.21 Two Coins 3
- 3.22 Two Coins 2 Solution
- 3.23 Two Coins 3 Solution
- 3.24 Two Coins 4

- 4 Bayes Rule
- 4.1 Cancer Test
- 4.2 Cancer Test Solution
- 4.3 Prior and Posterior
- 4.4 Prior and Posterior Solution
- 4.5 Normalizing 2
- 4.6 Normalizing 2 Solution
- 4.7 Normalizing 3
- 4.8 Normalizing 3 Solution
- 4.9 Normalizing 4
- 4.10 Normalizing 4 Solution
- 4.11 Total Probability
- 4.12 Total Probability Solution
- 4.13 Bayes Rule Diagram
- 4.14 Equivalent Diagram
- 4.15 Cancer Probabilities
- 4.16 Cancer Probabilities Solution
- 4.17 Probability Given Test
- 4.18 Probability Given Test Solution
- 4.19 Normalizer
- 4.20 Normalizer Solution
- 4.21 Normalizing Probability
- 4.22 Normalizing Probability Solution
- 4.23 Disease Test 1
- 4.24 Disease Test 1 Solution

- 5 Programming Bayes Rule
- 5.1 Printing Number Solution
- 5.2 Functions
- 5.3 Functions Solution
- 5.4 Complement
- 5.5 Complement Solution
- 5.6 Two Flips
- 5.7 Two Flips Solution
- 5.8 Three Flips
- 5.9 Three Flips Solution
- 5.10 Flip Two Coins
- 5.11 Flip Two Coins Solution
- 5.12 Flip One of Two
- 5.13 Flip One of Two Solution
- 5.14 Program Flipping
- 5.15 Program Flipping Solution
- 5.16 Cancer Example 1
- 5.17 Cancer Example 1 Solution
- 5.18 Cancer Example 1 Solution
- 5.19 Calculate Total
- 5.20 Cancer Example 2
- 5.21 Calculate Total Solution
- 5.22 Cancer Example 2 Solution
- 5.23 Program Bayes Rule
- 5.24 Program Bayes Rule Solution

- 6 Correlation vs Causation
- 6.1 Mortality
- 6.2 Mortality Solution
- 6.3 Deciding
- 6.4 Deciding Solution
- 6.5 Assuming Causation
- 6.6 Considering Health
- 6.7 Considering Health Solution
- 6.8 Correlation 1
- 6.9 Correlation 1 Solution
- 6.10 Correlation 2
- 6.11 Correlation 2 Solution
- 6.12 Correlation 3
- 6.13 Correlation 3 Solution
- 6.14 Causation Structure
- 6.15 Fire Correlation
- 6.16 Fire Correlation Solution
- 6.17 Fire Causation
- 6.18 Fire Causation Solution
- 6.19 Assignment

- 7 Supplementary Resources
- 7.1 Bayes' Theorem

I have here a U.S. dollar coin. It has two sides, one showing a head and one showing what's called tails. In probability, I'm giving a description of this coin, and I'm making data. We just make data. [sound of coin spinning] So if we look at the coin, it came up heads. So I just made a data point of flipping the coin once, and it came up heads. Let me do it again. [sound of coin spinning] And--wow! It came up heads again. So my new data is {heads, heads}. And you can see how it relates to the data we studied before when we talked about histograms and pie charts and so on. Let me give it a third try. [sound of coin spinning] And, unbelievably, it comes up once again heads. So let me ask a statistical question to test your intuition. Do you think if I twist this coin more frequently will it always come up heads? And say I try to twist it as fairly as I possibly can.

And you can debate it, but I think the best answer is no. This is what's called a fair coin, and that means it really has a 50% chance of coming up tails. So let me spin it again. [sound of coin spinning] And, not surprisingly, it actually came up tails this time. So probability is a method of describing the anticipated outcome of these coin flips.

Let's talk about a fair coin. The probability of the coin coming up heads is written in this P notation. This reads probability of the coin coming up heads. And in a fair coin, the chances are 50%. That is, in half the coin flips, the coin should come up heads. In probability we often write 0.5, which is half of 1. So a probability of 1 means it always occurs. A probability of 0.5 means it occurs half the time. And let me just ask you what do you think, for this coin, is the probability of tails?

And I would say the answer is 0.5. Let me now go to a coin that is what is called "loaded."

A loaded coin is one that comes up with one of the two much more frequently than the other. So, for example, suppose I have a coin that always comes up heads. What probability would I assess for this coin to come up heads? What would be the right number over here?

And the number is 1. That's the same as 100%. 1 just means it always comes up in heads.

And, given that, what number would you now assess the probability of tails to be?

And, yes, the answer is zero. And we find a little law here we just want to point out, which is the probability of heads plus the probability of tails equals 1. And the reason why that's the case is the coin either comes up heads or tails. There is no other a choice. So no matter what happens, if I look at heads and tails combined the chances of either of those occurring is 1, because we know it's going to happen. So we can use this law to compute the probability of tails for other examples.

So suppose the probability of heads is 0.75, that is, 3 out of 4 times we're going to get heads. What is the probability of tails?

And the answer is 0.25, which is 1 - 0.75 using the law down here. As you can verify, 0.75 + 0.25 =1.

So we just learned something important. There's a probability for an outcome; I'm going to call it A, for now. And we learned that the probability of the opposite outcome, which we're going to call ¬A (this over here just means "not") is 1 minus the probability as expressed right over here. That's a very basic law of probability, which will become handy as we go forward, so please remember it.

In our example, we observed heads twice. So now I want to ask you a really tricky question: What's the probability of observing heads and heads if you flip the same unbiased coin twice? This means in each flip we assume the probability of heads is 0.5. Please answer here.

That was a tricky question, and you couldn't really know the answer if you've never seen probability before, but the answer is 0.25. And I will derive it for you using something called a truth table. In a truth table, you draw out every possible outcome of the experiment that you conducted. There were two coin flips--flip 1 and flip 2--and each had a possible outcome of heads, heads; heads, tails; tail, heads; and tail, tail. So when we look at this table, you can see every possible outcome of these two coin flips. There happens to be four of them. And I would argue because heads and tails are equally likely, each of these outcomes is equally likely. Because we know that the probability of all outcomes has to add up to 1, we find that each outcome has a chance of a quarter, or, 0.25. Another way to look at this is the probability of heads followed by heads is the product . What are the chances of the first outcome to be heads multiplied by the probability of the second outcome to be heads? The first is 0.5, as is the second. And if you multiply these two numbers, it's 0.25, or, a quarter.

Let me now challenge you and give you a loaded coin I flipped twice. And for this loaded coin, I assumed the probability of heads is 0.6. That really changes all of the numbers in the table so far, but you can apply the same method of truth tables to arrive at an answer for what is the probability of seeing heads twice under the assumption that the probability of heads equals 0.6? And I want to do this in steps, so rather than asking the question directly, let me help you derive it by first asking: What's the probability of tails?

And the answer is 0.4 because heads comes up 0.6, and 1 - 0.6 = 0.4.

And our Ps fill out the entire truth table. There are four values over here, so please compute them for me.

And the answer using our product rule is heads, heads comes out to 0.6 * 0.6, which is 0.36. Heads followed by tails is 0.6 * 0.4, which is 0.24. Tails followed by heads is, again, 0.24. And tails followed by tails is 0.16, which is 0.4 * 0.4.

If you add up these numbers over here-- please go ahead and add them up and tell me what the sum of those numbers is.

And, not surprisingly, it's 1. That is, the truth table always has a probability that adds up to 1 because it considers all possible cases, and all possible cases together have a probability of 1. So we just check this and make sure it's correct. Reading from this table, we find that the probability of (H,H) is 0.36. And you can do the same over here. 0.6 * 0.6 = 0.36 So that's our correct answer.

Let's now go to the extreme, and this is a challenging probability question. Suppose the probability of heads is 1, so my coin always comes up with heads. What is the probability of (H,H)?

And the answer is 1. To see this, we know that the probability of tails is 0. All the probability goes to heads. 1 * 1 = 1 1 * 0 = 0 0 * 1 = 0 And 0 * 0 = 0. And it's easy to verify that all these things add up to 1. Our (H,H) is just 1.

The truth table gets more interesting when we ask different questions. Suppose we flip our coin twice. What we care about is that exactly one of the two things is heads, and thereby exactly the other one is tails. For a fair coin, what do you think the probability would be that if I flip it twice we would see heads exactly once?

And the answer shall be 0.5. And this is a nontrivial question. Let's do the truth table. So, for flip-1, we have the outcomes of heads, heads, tails, tails. For flip-2, heads and tails and heads and tails. These are all possible outcomes. And we know for the fair coin each outcome was equally likely. That is, exactly one quarter.

Given that, we now have to associate a truth table with the question we're asking. So where exactly is, in the outcome, heads represented once? Please check the corresponding cases.

And the answer is 0.9 with just 1 minus the cancer.

In real life, things depend on each other. Say you can be born smart or dumb and for the sake of simplicity, let's assume whether you're smart or dumb is just nature's flip of a coin. Now whether you become a professor at Standford is non-entirely independent. I would argue becoming a professor in Standford is generally not very likely, so probability might be 0.001 but it also depends on whether you're born smart or dumb. If you are born smart the probability might be larger, whereas if you're born dumb, the probability might be marked more smaller. Now this just is an example, but if you can think of the most two consecutive coin flips. The first is whether you are born smart or dumb. The second is whether you get a job on a certain time. And now if we take them in these two coin flips, they are not independent anymore. So whereas in our last unit, we assumed that the coin flips were independent, that is, the outcome of the first didn't affect the outcome of the second. From now on, we're going to study the more interesting cases where the outcome of the first does impact the outcome of the second, and to do so you need to use more variables to express these cases.

Of course, in reality, we don't know whether a person suffers cancer, but we can run a test like a blood test. The outcome of it blood test may be positive or negative, but like any good test, it tells me something about the thing I really care about--whether the person has cancer or not. Let's say, if the person has the cancer, the test comes up positive with the probability of 0.9, and that implies if the person has cancer, the negative outcome will have 0.1 probability and that's because these two things have to add to 1. I've just given you a fairly complicated notation that says the outcome of the test depends on whether the person has cancer or not. That is more complicated than everything else we've talked about so far. We call this thing over here a conditional probability, and the way to understand this is a very funny notation. There's a bar in the middle, and the bar says what's the probability of the stuff on the left given that we assume the stuff on the right is actually the case. Now, in reality, we don't know whether the person has cancer or not, and in a later unit, we're going to reason about whether the person has cancer given a certain data set, but for now, we assume we have god-like capabilities. We can tell with absolute certainty that the person has cancer, and we can determine what the outcome of the test is. This is a test that isn't exactly deterministic--it makes mistakes, but it only makes a mistake in 10% of the cases, as illustrated by the 0.1 down here. Now, it turns out, I haven't fully specified the test. The same test might also be applied to a situation where the person does not have cancer. So this little thing over here is my shortcut of not having cancer. And now, let me say the probability of the test giving me a positive results--a false positive result when there's no cancer is 0.2. You can now tell me what's the probability of a negative outcome in case we know for a fact the person doesn't have cancer, so please tell me.

And the answer is 0.8. As I'm sure you noticed in the case where there is cancer, the possible test outcomes add up to 1. In the where there isn't cancer, the possible test outcomes add up to 1. So 1 - 0.2 = 0.8.

Look at this, this is very nontrivial but armed with this, we can now build up the truth table for all the cases of the two different variables, cancer and non-cancer and positive and negative tests also. So, let me write down cancer and test and let me go through different possibilities. We could have cancer or not, and the test may come up positive or negative. So, please give me the probability of the combination of those for the very first one, and as a hint, it's kind of the same as before where we multiply two things, but you have to find the right things to multiple in this table over here. This is not an easy question.

And the answer is probability of cancer is 0.1, probability of test being positive given that he has cancer is the one over here--0.9, multiplying those two together gives us 0.09.

And once again, we'd like to refer the corresponding numbers over here on the right side 0.1 for the cancer times the probability of getting a negative result conditioned on having cancer and that is 0.1 * 0.1, which is 0.01.

Moving on to the next case. What do you think the answer is?

And here the answer is 0.18 by multiplying the probability of not having cancer, which is 0.9, with the probability of getting a positive test result for a non-cancer patient 0.2. Multiplying 0.9 with 0.2 gives me 0.18.

Let's just quickly do the final one, because it's the most likely one.

Here you get 0.72, which is the product of not having cancer in the first place 0.9 and the probability of getting a negative test result under the condition of not having cancer.

Moving to the next case--what do you think the probability is that the person does have cancer but the test comes back negative? What's the combined probability of these two cases?

Now let me ask you a really tricky question. What is the probability of a positive test result? Can you sum or determine, irrespective of whether there's cancer or not, what is the probability you get a positive test result?

Now quickly, do me a favor and add all of those up. What do you get?

And as usual, the answer is 1. That is, we study in the truth table all possible cases. When you add up the properties, you should always get the answer of 1.

And the result, once again, is following the truth table, which is why this table is so powerful. Let's look at where in the truth table we get a positive test result. I would say it is right here, right here. If you take corresponding probabilities of 0.9 and 0.18, and add them up, we get 0.27, and that's the correct answer for getting a positive result.

Putting all of this into mathematical notation we've given the probability of having cancer and from there, it follows the probability of not having cancer. And they give me 2 condition probability that are the test being positive. If we have have cancer, from which we can now predict the probability of the test being negative of having cancer. And the probability of the test being positive can be cancer free which can complete the probability of a negative test result in the cancer-free case. So these things are just easily inferred by the 1 minus rule. Then when we read this, you complete the probability of a positive test result as the sum of a positive test result given cancer times the probability of cancer, which is our truth table entry for the combination of P and C plus the same can be done of cancer. Now this notation is confusing and complicated if we entered that deep probability, that's called total probability, but it's useful to know that this is very, very intuitive and to further develop an equation that can just give you another exercise of exactly the same type.

This time around, we have a bag, and in the bag are 2 coins, coin 1 and coin 2. And in advance, we know that coin 1 is fair. So P of coin 1 of coming up heads is 0.5 whereas coin 2 is loaded, that is, P of coin 2 coming up heads is 0.9. Quickly, give me the following numbers of the probability of coming up tails for coin 1 and for coin 2.

And the answer is 0.5 for coin 1 and 0.1 for coin 2, because these things have to add up to 1 for each of the coins.

So now what happens is I'm gonna remove one of the coins from this bag, and each coin, coin 1 or coin 2, is being picked with equal probability. Let me now flip that coin once, and I want you to tell me, what's the probability that this coin which could be 50% chance fair coin and 50% chance a loaded coin. What's the probability this coin comes up heads? Again, this is an exercise in conditional probability.

Now let me up the ante by flipping this coin twice. Once again, I'm drawing a coin from this bag, and I pick one at 50% chance. I don't know which one I have picked. It might be fair or loaded. And in flipping it twice, I get first heads, and then tails. What's the probability that if I do the following, I draw a coin at random with the probabilities shown, and then I flip it twice, that same coin. I just draw it once and then flip it twice. What's the probability of seeing heads first and then tails? Again, you might derive this using truth tables.

Let's do the truth table. You have a pick event followed by a flip event. We can pick coin 1 or coin 2. There is a 0.5 chance for each of the coins. Then we can flip and get heads or tails for the coin we've chosen. Now what are the probabilities? I'd argue picking 1 at 0.5 and once I pick the fair coin, I know that the probability of heads is, once again, 0.5 which makes it 0.25. The same is true for the fair coin and expecting tails, but as we pick the unfair coin with a 0.5 chance, we get a 0.9 chance of heads. So 0.5 times 0.95 gives you 0.45, whereas the unfair coin, the probability of tails is 0.1 multiply by the probability of picking it at 0.5 gives us 0.05. Now when they ask you, what's the probability of heads, we'll find that 2 of those cases indeed come up heads, so if you add 0.25 and 0.45 and we get 0.7. So this example is a 0.7 chance that we might generate heads.

This is a non-trivial question, and the right way to do this is to go through the truth table, which I've drawn over here. There's 3 different things happening. We've taken initial pick of the coin, which can take coin 1 or coin 2 with equal probability, and then you go flip it for the first time, and there's heads or tails outcomes, and we flip it for the second time with the second outcome. So these different cases summarize my truth table. I now need to observe just the cases where head is followed by tail. This one right here and over here. Then we compute the probability for those 2 cases. The probability of picking coin 1 is 0.5. For the fair coin, we get 0.5 for heads, followed by 0.5 for tails. They're together is 0.125. Let's do it with the second case. There's a 0.5 chance of taking coin 2. Now that one comes up with heads at 0.9. It comes up with tails at 0.1. So multiply these together, gives us 0.045, a smaller number than up here. Adding these 2 things together results in 0.17, which is the right answer to the question over here. That was really non-trivial, and I'd be amazed if you got this correct.

Let me do this once again. There are 2 coins in the bag, coin 1 and coin 2. And as before, taking coin 1 at 0.5 probability. But now I'm telling you that coin 1 is loaded, so give you heads with probability of 1. Think of it as a coin that only has heads. And coin 2 is also loaded. It gives you heads with 0.6 probability. Now work out for me into this experiment, what's the probability of seeing tails twice?

Let's use the cancer example from my last unit. There's a specific cancer that occurs in 1% of the population, and a test for this cancer and with 90% chance it is positive if they have this cancer, C. That's usually called the sensitivity. But the test sometimes is positive, even if you don't have C. Let's say with another 90% chance it's negative if we don't have C. That's usually called the specificity. So here's my question. Without further symptoms, you take the test, and the test comes back positive. What do you think is the probability of having that specific type of cancer? To answer this, let's draw a diagram. Suppose these are all of the people, and some of them, exactly 1%, have cancer. 99% is cancer free. We know there's a test that if you have cancer, correctly diagnose it with 90% chance. So if we draw the area where the test is positive, cancer and test positive, then this area over here is 90% of the cancer circle. However, this isn't the full truth. The test sent out as positive even if the person doesn't have cancer. In fact, in our case, it happened to be in 10% of all cases. So we have to add more area, because as big as 10% of this large area is as big as 10% of this large area where the test might go positive, but the person doesn't have cancer. So this blue area is 10% of all the area over here minus the little small cancer circle. And clearly, all the area outside these circles corresponds a situation of no cancer, and the test is negative. So let me ask you again. Suppose we have a positive test, what do you think? Would a prior probability of cancer of 1%, a sensitivity and specificity of 90%, Do you think your new chances are now 90% or 8% or still just 1%?

And I would argue it's about 8%. In fact, as we see, it will come out at 8 1/3% mathematically. And the way to see this in this diagram is this is the region that should test as positive. By having a positive test, you know you're in this region, and nothing else matters. You know you're in this circle. But within this circle, the ratio of the cancerous region relative to the entire region is still pretty small. It increase, obviously, having a positive test changes your cancer probability, but it only increases by a factor of about 8, as we will see in a second.

So this is the essence of Bayes Rule, which I'll give to you to you in a second. There's some sort of a prior, of which we mean the probability before you run a test, and then you get some evidence from the test itself. and that all leads you to what's called a posterior probability. Now this is not really a plus operation. In fact, in reality, it's more like a multiplication, but semantically, what Bayes Rule does is it incorporates some evidence from the test into your prior probability to arrive at a posterior probability. So let's make this specific. In our cancer example, we know that the prior probability of cancer is 0.01, which is the same as 1%. The posterior of the probability of cancer given that our test is positive, abbreviate here as positive, is the product of the prior times our test sensitivity, which is what is the chance of a positive result given that I have cancer? And you might remember, this was 0.9, or 90%. Now just to warn you, this isn't quite correct. To make this correct, we also have to compute the posterior with the non cancer option, which there is no cancer given a positive test. And using the prior, we know that P of not C is 0.99. It's minus P of C Times the probability of getting a positive test result given not C. Realize these 2 equations are the same, but I exchanged C for not C. And this one over here takes a moment to computer. We know that our test gives us a negative result if it's cancer free, 0.9 chance As a result, it gives us a positive result in the cancer free case, with 10% chance. What's interesting is this is about the correct equation except the probabilities don't add up to 1. I'm going to ask you to compute those, so please give me the exact numbers for the first expression and the second expression written over here using our example up there.

Obviously, P(C) is 0.01 * 0.9 is 0.009, whereas 0.99 * 0.1, this guy over here, is 0.099. What we've computed is here is the absolute area in here which is 0.009 in the absolute area in here which is 0.099.

The normalization proceeds in two steps. We just normalized these guys to keep ratio the same but make sure they add up to 1. So let's first compute the sum of these two guys. Please let me know what it is.

And, yes, the answer is 0.108. Technically, what this really means is the probability of a positive test result-- that's the area in the circle that I just marked. By virtue of what we learned last, it's just the sum of two things over here, which gives us 0.108.

And now finally, we come up with the actual posterior, whereas this one over here is often called the joint probability of two events. And the posterior is obtained by dividing this guy over here with this normalizer. So let's do this over here--let's divide this guy over here by this normalizer to get my percent distribution of having cancer given that I received the positive test result. So divide this number by this.

And we get 0.0833.

Let's do the same for the non-cancer version, pick the number over here to divide and divide it by this same normalizer.

The answer is 0.9167 approximately.

Why don't you for a second add these two numbers and give me the result?

And the answer is 1 as you will expect. Now this was really challenging. You can see a lot of math in this slide. Let me just go over this again and make it much, much easier for you.

Well, we really said that we had a situation where the prior P(C), a test with a certain sensitivity (Pos/C), and a certain specificity (Neg/₇C). When you receive, say, a positive test result, what you do is, you take your prior P(C) you multiply in the probability of this test result, given C, and you multiply in the probability of the test result given (Neg/₇C). So, this is your branch for the consideration that you have cancer. This is your branch for the consideration of no cancer. When you're done with this, you arrive at a number that now combines the cancer hypothesis with the test result. Look for the cancer hypothesis and the no cancer hypothesis. Now, what you do, you add those up and then normally don't add up to one. You get a certain quantity which happens to be the total probability that the test is what it was in this case positive. And all you do next is divide or normalize this thing over here by the sum over here and the same on the right side. The divider is the same for both cases because this is your cancer branch, your non-cancer branch, but this score does not depend on the cancer variable anymore. What you now get out is the desired posterior probability, and those add up to 1 if you did everything correct, as shown over here. This is the algorithm for Bayes Rule.

Now, the same algorithm works if your test says negative. We'll practice this in just 1 second. Suppose your test result says negative. You could still ask the same question: Now, what's my probability having cancer or not? But now all the positives in here become negatives. The sum is the total probability of negative test results, and we may now divide by this score, you now get the posterior probability for cancer and non-cancer assuming you had a negative test result, which of course to be much, much more favorable for you because none of us wants to have cancer. So, look at this for a while and let's now do the calculation for the negative case using the same numbers I gave you before, and with the step by step this time around so it can really guide you through the process.

We begin with our prior probability, our sensitivity and our specifitivity, and I want you to begin by filling in all the missing values. So, there's the probability of no cancer, probability of negative, which is negation of positive, given C, and probability of negative-positive given not C.

And obviously this is times 0.99 as before 0.1 and 0.1. I hope you got this correct.

Now assume the test comes back negative, the same logic applies as before. So please give me the combined probability of cancer given the negative test result and the combined probability of being cancer-free given the negative test result.

The number here is 0.001 and it's the product of my prior for cancer which is 0.01, and the probability of getting a negative result in the case of cancer which is right over here, 0.1. If I multiply these two things together, I get 0.001. The probability here is 0.891. And when I'm multiplying is the prior probability of not having cancer which is 0.99 with the probability of seeing a negative result in the case of not having cancer, and that is the one right over here, 0.9. So, we'll multiply 0.99 with 0.9, I actually get 0.891.

Let's compute the normalizer. You now remember what this was.

And the answer is 0.892. You just add up these two values over here.

And now finally tell me what is posterior probability of cancer given that we know we had a negative test result and the probability of negative cancer given there is a negative test result. Please give me the numbers here.

This is approximately 0.0011, which we get by dividing 0.001 by the normalizer 0.892, and the posterior probability of being cancer-free after the test is approximately 0.9989, and that's obtained by dividing this probability over here by the normalizer and not surprisingly, these two values indeed add up to 1. Now, what's remarkable about this outcome is really what it means. Before the test, we had a 1% chance of having cancer, now, we have about a 0.9% chance of having cancer. So, a cancer probability went down by about a factor of 9. So, the test really helped us gaining confidence that we are cancer-free. Conversely, before we had a 99% chance of being cancer free, now it's 99.89%. So, all the numbers are working exactly how we expect them to work.

Let me now make your life harder. Suppose our probability of a certain other kind of disease is 0.1, so 10% of the population has it. Sick people have 0.9 prob. to be test-positive , so the test is very informative, but there's a 0.5 chance that if I'm disease-free the test, indeed, says the same thing. So the sensitivity is high, the specificity is lower. And let's start by filling in the first 3 of them.

Obviously, these are just 1 minus those: 0.9, 0.1, and 0.5. Notice that these two numbers may very well be different. There is no contradiction here. These guys have to add up to 1, so given ¬C, the probability of positive and negative have to add up to 1, but these guys don't. It takes a lot of practice to understand which numbers have to add up to 1. But I set it up in a way that you should have gotten it right.

And not surprisingly, when you hit run, you should get 0.3 in the output window.

We will now give a slightly more complicated problem of the form we're going to be using we're going to be using and I hope you won't be too confused. So, this is it. There's two parts to it. There's the print command as before but rather than printing 0.3 directly, we're going to print the function that computes something from 0.3. Right now it's identity is going to print out exactly the same value but we're doing this is to set ourselves up to print something else. And here is how f is defined--we define f with the element p that will be set to 0.3, to be just a function that returns exactly the same value. Why do we do this? Well, to practice programming. So, hit the run button and see what happens.

And once again, we get 0.3 and the reason is, as we go up, 0.3 is being funneled into the function f. F starts over here and p is now 0.3. We return them and the value is 0.3 straight from the input and then the return of this is being printed. Sounds complex--well, from now on all I want you to do is to modify what's inside this function.

So the first exercise, say this is the probability, let's print the probability of the inverse event. Let's make the function over here that takes p but returns 1 - p. So please go ahead and modify this code such that the return value is 1 minus p and not p.

This modification just replaces p by 1-p in the return function and then I run it I get 0.7. So the nice thing about our complimentary probability we can now plug in a different value over here, say 0.1--with 0.1, I get as an output 0.9. So congratulations, you've implemented the very first example of probability where the event probability is 0.1 and the complementary event and negation of it is encapsulated in this function over here.

Here is my next quiz for you. Supposed we have a coin with probability p. For example, p might be 0.5. You flip the coin twice and I want to compute the probability that this coin comes up head and heads in these 2 flips--obviously that's 0.5 times 0.5. But I want to do in a way that I can use any arbitrary value for p using the same style of code as before. So all your going to modify is the 1-p into something that if I give a probability p returns to me the probability of seeing heads twice in this coin--that is the probability of heads.

And here is one way to implement this, just return p * p and for 0.5, it gives me 0.25. If I make this a loaded coin of probability of heads a 0.1, then the outcome is 0.01 and I hit the run button.

So let's up the ante and say we have a coin that has a certain probability of coming up with heads--again, it might be 0.5. Just like before it will be an input to the function f and now I'm going to flip the coin 3 times and I want you to calculate the probability that the heads comes up exactly once. Three is not a variable so you could only works for 3 flips not for 2 or 4 but the only input variable is going to be the coin probability 0.5. So please change this code to express that number.

You might remember for P = 0.5 then you go to the truth table, you’ll find the answer is 0.375. If you set P to 0.8, the number actually goes down at 0.096. So, you can check the implementation to see if you get the exact the same numbers. So, here’s my result--when you build the truth table, you’ll find that exact the 3 possible outcomes have had exactly once; it’s H T T, T H T, and T T H. So, of the 8 possible outcomes of the coin flips, those 3 are the ones you want to count. Now, each has exactly the same probability of P for heads x (1 - P) x (1 - P). So, they get all 3 of them together, we just multiply these by 3. And this is how it looks in the source code 3 x P x (1 - P) x (1 - P) if, for example, I give this input 0.8, then I get 0.096 as an answer. But if you never programmed before and you got this fight, then congratulations! You might be actually a programmer. Obviously, if you programmed before, this should relatively straightforward but it’s fun to practice. Let’s now go to a case maybe of 2 coins.

So coin 1 has a probability of heads equals P₁ and coin 2 has a probability of heads equals P₂ and this may have now be different probabilities. In a primary environment, they can account this by making 2 arguments separated by a comma, for example, 0.5 and 0.8, and then the function takes as an input, 2 arguments, P₁ and P₂, and they can use both of these variables in the return assignment. Let’s now flip both coins and write the code that computes the probability that coin 1 equals heads and coin 2 equals heads for example of 0.5 and 0.8, this would be?

Yes, 0.4 is the product of these two values over here. So, in reality the solution is just to apply the product p1 * p2, and hitting the 1 button gives me indeed 0.4. I can now go and change this probability to 0.1 and 0.8. You probably already figured out that the answer is now 0.08, and indeed, my code gives me the following result, 0.08.

And now comes the hard part--I have coin 1 and coin 2. In each of them is a probability before, that’s coin 1 comes up with heads and coin 2 comes up with heads, we call these P₁ and P₂. But here’s the difficulty, before I flip the coin, I’m going to pick 1 coin and I pick coin C1 with probability P₀ and I pick C2 with probability 1 - P₀. Once I’ve picked the coin, I flip it exactly once, and now I want to know what’s the probability of hits? Let’s do an exercise first, say P₀ is 0.3, P₁ is 0.5, and P₂ is 0.9. What’s the probability of observing heads for this specific example?

And the answer was 0.78 and how it got there is the following. I pick coin 1with probably 0.3 and once I picked it, the chance of getting heads is 0.5. But I might have alternatively picked C2, which has a probability of 1-0.3 and 0.7, and C2 gives me heads with a chance of 0.9. When I worked all this out, I get 0.78.

So the task for you now is to implement the function with three input arguments that it computes this number over here so that it can vary any of those and still get the absolute correct answer for this function over here. If you've never programmed before, this is tricky. You have to add one more argument and you have to change the returned function to implement a formula just like this but this using p0, p1, p2 as arguments not just the fixed numerical numbers here.

And here is my answer. You can really rid off the formula that I just gave you. It's easier if we pick coin one and it comes with head p1 and with 1-p0 we pick coin two and it with comes head it will probably be p2. So you can now give in three arguments p0, p1, and p2 such as 0.3, 0.5,.0.9 and it gets us 0.78 if I hit the run button. Interestingly, you had changed this numbers, for example the first one is 0.1 and the last one to 0.2. I now get a different result of 0.23.

Let's go the cancer example. These are prior possibility of cancer we should call P₀. This is a probability give a positive test given cancer. I call this P₁ and careful, these are probably a given negative test result for don't have cancer and I call this P₂. Just to check suppose probability of cancer is 0.1, the sensitivity 0.9, specificity is 0.8. Given the probability that a test will come out positive. It's not Bayes rule yet, it's a simpler calculation and you should know exactly how to do this.

The answer is 0.27. We first consider the possibility we have actually have cancer, in which our tests will give us a positive result of 0.9 chance, and then we add the possibility of not having cancer that's 1-0.1 or 0.9, and then when one gives us a positive result with 0.2 chance or 1-0.8. Resolving this gives us something like this that is 0.09+0.18 adds up to 0.27. So now, I want you to write the computer code that accepts arbitrary P₀, P₁, P₂ and calculates the resulting probability of a positive test result and here's my answer. My code does exactly what I have shown you before. It first considers the possibility of cancer multiply this with the test sensitivity P₁ and then it absorbs the opposite possibility, and of course, the specificity over here refers to a negative test results, so we take 1 minus that to get the +1. Adding these two parts up give us the desired results so let's try this. Here's my function f with the prime that we just assumed, and if I hit, run I get 0.27. Obviously, I can change the prime with this. So suppose we make it much less likely to have cancer in the prior from 0.1 to 0.01 then my 0.27 changes to 0.207. Now realize it is not the posterior in Bayes' rule. This is just the probability of getting a positive test result. You can see this if you change the prior probability of cancer to zero, which means we don't have cancer no matter what the test result says, but there's 0.2 chance of getting a positive test result, and the reason is our test has a specificity of 0.8 that is even in the absence of cancer, there's 0.2 chance of getting a positive test result.

Now, let's go to the holy grail and implement today's work. Let's look at the posterior probability of cancer given that we received the positive test result, and let's first do this manually for the example given up here. So what do you think it is?

And the answer is 0.0333 or a 1/3 and now we're going to apply the entire arsenal of inference we just learned about. The joint probability of cancer and positive is 0.1 * 0.9. That's the joint that's not normalized. So let's normalize it and we normalize it by the sum of the joint for cancer and the joint for non-cancer. Joint for cancer we just computed but the joint for non-cancer assumes the opposite prior 1-0.1 and it applies the positive result of a non-cancer case. Now because the specificity first is negative, we have to do the same trick as before and multiply it with 1-0.8. When you worked this out, you find this to be 0 to 0.9 divided 0 to 0.9 + 0.9 * 0.2 that is 0.18 So if you put these all of this together, you get exactly a third.

So I want you to program this in the IDE where there are three input parameters P⁰, P¹ and P². For those values, you should get a 1/3 and for those values over here, 0.01 as a prior 0.7 as sensitivity and 0.9 as specificity, you'll get 0.066 approximately. So write this code and check whether these examples work for you.

And here's my code, this implements Bayes rule. You take p0 a prior times a probability of seeing a positive test result and divided by the sum of the same plus the expression for not having cancer, which is the inverse prior and the inverse of this specificity is shown over here. When I plug in my reference numbers, the ones from over here, I indeed get 0.33333. So, this is the correct code and we can plug in our return numbers. It's fun if we give it a zero probability prior to have cancer and guess what, no matter what the test is, you still don't have cancer. That's the beauty of Bayes' rule, it takes the prior very seriously.

Suppose you are sick, and you wake up with a strong pain in the middle of the night. You so sick that you fear you might die, but you're not sick enough not to apply the lessons of my Statistics 101 class to make a rational decision whether to go to the hospital. And in doing so, you consult the titer. You find that in your town, over the last year, 40 people were hospitalized of which 4 passed away. Whereas the vast part of the population of your town never went to the hospital, and of those, 20 passed away at home. So compute for me the percentages of the people who died in the hospital and the percentage of the people who died at home.

Now all of these are fictitious example, These are relatively high large numbers, but what's important to notice is the chances of dying in a hospital are 4 times as large as dying at home. That means whether you die or not it's correlated to whether or not you are in a hospital. So the chances of dying in a hospital are indeed 4 times larger than at home. So let me ask the critical question. Shall you now stay at home given that you are a really smart statistic student, can you resist the temptation to go to the hospital, because indeed, it might increase your chances of passing away?

Both answers count, but clearly the correct answer is no. You should go to the hospital. Hospitals don't cause sick people to die. I know that there's been lots of studies that show of a perfectly healthy person goes to the hospital, they might actually catch a disease there, but hospitals have a reason. They want to cure people. Why is this interesting? Because based on the correlation data, it seems that being in a hospital makes you 4 times as likely die than being at home, but it doesn't mean by staying at home, you'll reduce your chances of dying. So this is a statement of correlation.

Being in a hospital, that effect alone, increases your of dying by a factor of 4. Is it causal statement? It says the hospital causes you to die; much less, it coincides with the fact that you die. And very frequently, people in the public get this wrong. People observe this as a correlation, but it suggests the correlation is causal, in attempting to make you understand the statistic as a correlection. Now to understand why this could be wrong, let's dive a little bit deeper into the same example.

Let's say of the 40 people in the hospital, 36 were actually sick and passed away, and some were healthy, 4 of them, and they all survived. Let's further assume, for the people at home, 40 were indeed sick, and 50 of them passed away, whereas the remaining 7,960, they were healthy, also inquired a total death of 20, perhaps because of accidents. These statistics are consistent with the statistics I gave you before. We just added another variable, whether the person's sick or healthy. Please now fill out once again, in percent, what is the percentage of people that passed away in each of of those 4 groups?

When you divide 4 by 36, you get 0.111, or in percent, 11.1%. That's the mortality rate of sick people in the hospital. It's 0 for the healthy people. The mortality rate at home is 50%--half of them pass away-- and 0.25% approximately for healthy people. Now, if we look at this, we realize that you are likely sick. If you fall into the sick category, your chances of dying at home are 50% and it's just about 11% in the hospital. So, you should really go to the hospital very quickly

Let's observe in more detail why the hospital example gives us such a wrong conclusion. We study two variables--in-hospital and dying or passing away. We rightfully observe that these two things are correlated. If we were to do a scatter plot where we have two categories-- whether or not we're in the hospital and whether or not a person passed away-- you find there's an increased occurrence of data over here and of data over here relative to the other to data points over here. That means the data correlates. What does correlation mean? In any plot, data is correlated if knowledge about one variables tells us something about the other. This is a correlated data plot. Here's another data plot. Correlated or not? Yes or no?

The answer is no. No matter where I am in A, B seems to be the same.

And now the data sits, a square in which data is uniformly arranged, correlated, yes or no?

The answer is negative. No matter where I am in A, the range for B is the same, as is the mean estimate.

Another data set--there's boomerang over here. Correlated? Yes or not.

The answer is yes--clearly for different values of A, I get different values of B. Another linear correlation yet still a correlation.

So clearly in our example, whether or not you're in a hospital correlated with whether or not you died, but the truth is, the example omitted and important variable, the sickness, the disease itself. And in fact, the sickness did cause you to die, and also effected your decision of whether you go to a hospital or not. So if you draw acts of causation, you find sickness causes death, and sickness causes you to go to the hospital, and if anything at all, once you knew you were sick, being in the hospital negatively correlated with you dying; that is, being in a hospital made it less likely for you to pass away given that you were sick. In statistics, we call this a confounding variable. It's very tempting to just omit this from your data, and if you do, you might find correlations; in this case, a positive correlation between the hospital and death, that have nothing to do with the way things are being caused, and as a result, those correlations don't relate at all to what you should do. So let's study another example.

Suppose you observed a number of different fires and you graph the number of firefighters versus the size of the fire. And for the sake of the argument, let's assume we studied four fires with 10, 40, 200, and 70 firefighters involved and the sizes of the fires were given as follows: 100, 400, 2000, and 700 in terms of the surface area that the fire occupied. Putting this into a diagram, you get pretty much the following. Put the number of fighters. In fact, you've already learned this looks very linear. So, let me ask a question--is the number of the firefighters correlated with the size of the fire? Yes or no? Check one of the two boxes.

And obviously it is because there's a strong linear correlation.

Now the real question I'm bringing up to is, "Do firefighters cause fire?" or more extremely, "If you're going to get rid of our firefighters, will you get rid of all the fire?" Obviously, this seems to be in the data.

And the answer is no. This is a case of what we call reverse causation. You can argue that the size of the fire causes the number of firefighters that is being destroyed, and that's because the bigger the fire the more firefighters the fire department will send. Now our graph, which shows the correlation between these two variables is oblivious to the direction of this arc. You could conclude size causes this and fire than the firefighters. You could conclude the number of firefighters causes this size. In both cases you could use exact the same data. But when I put it this way, it's pretty obvious that the right answer should be the size of the fire causes the number of firefighters to grow up and it's not from the data itself. It's because we know there's something about fire and firefighters. It's impossible to deduce from this data that it causes a relationship. It could be just coincidental or that cause a relationship could go either way.

So here's my assignment for you. Go online and check old news articles and find me one that takes data that show a correlation and from the data suggests causation or differently it tells what you what to do. I argue with the news is full of this abuses of statistics and we will talk later about how to set up a study to avoid this trap, but in your assignment if you find an article that has this property extract that text and post it to the discussion forum. I will be monitoring and comment on those and I want all of us to enjoy what kind of hilarious funny misinterpretations of statistics you can find that people confuse correlation and causation So go ahead and find me interesting articles. Thank you!

- An Intuitive Explanation of Bayes' Theorem by Eliezer Yudkowsky
- Bayes' Rule on Wikipedia
- Probability on Khan Academy. Includes Bayes
- Another example of Bayes' at YouTube
- Bayes' Rule at the Queen Mary School of Electronic Engineering and Computer Science, London