**These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!**

**Please check the wiki guide for some tips on wiki editing.**

Contents

- 1 Problem Set 2: Probability
- 1.1 01 Relative Probabilities 1
- 1.2 02 Relative Probabilities 1 Solution
- 1.3 03 Relative Probabilities 2
- 1.4 04 Relative Probabilities 2
- 1.5 05 Same Coin
- 1.6 06 Same Coin Solution
- 1.7 07 Many Flips
- 1.8 08 Many Flips Solution
- 1.9 09 Is It Fair 1
- 1.10 10 Is It Fair 1 Solution
- 1.11 11 Is It Fair 2
- 1.12 12 Is It Fair 2 Solution
- 1.13 13 Programming Challenge (Optional)
- 1.14 14 Programming Challenge (Optional) Solution

Let's suppose we have a fair coin and we flipped it four times.I'd like you to tell me what is the probability that there is exactly one head divided by the probability that only the very first flip is a head.Please enter your answer here.

The correct answer is 4.To see this, note that for the first flip to be only a head, the following is the only sequence possible.For the numerator to be true, there are three more possibilities.

*Note:*

Here's a detailed solution:

What's being asked is the ratio of the probabilities:

There are

possibilities for exactly one head:

There is one possibility for first flip is the only head:

Note that if you calculate the ratio, the fractions cancel out and all that remains is

, therefore:

Now, I'd like to tell me the ratio of the probability of exactly one head to the probability ofthe first flip being heads regardless of the other flips.

And the answer is 0.5, 1/2.To see this, recall that there were four possibilities for the numerator--all equally likely,and if the first flip is heads and we have no other restrictions, the last three flips can be anything.So each one has two possibilities and 2*2 2=8 and 4/8=0.5.*

*Note:*

Here's a detailed solution:

The following ratio of probabilities is being asked:

You have already calculated the probability of exactly one head:

For

, there are the following possibilities ( possibilities):

Or reasoning in a different way (Easier to visualize with a tree diagram):

```
First Flip: H
/ \
Second Flip: H T
/ \ / \
Third Flip: H T H T
/ \ / \ / \ / \
Fourth Flip: H T H T H T H T
```

The first flip has to be Heads and the remaining flips can be anything.

For the second flip you have

possibilities (Heads or Tails), therefore possibilities.For each of the possibilities for the second flip, you have two possibilities for the third flip as well, therefore

.For each of the possibilities for the third flip, you have two possibilities for the fourth flip as well, therefore

.

Think about all the paths you can choose if you start from the top of the diagram and want to go to the bottom.

The probability of first flip is heads is therefore:

Note that if you calculate the ratio, the fractions cancel out (as in the first question) and all that remains is

:

Let's assume there are two coins equally distributed, same probability of having each.The first type of coins are fair. The second type of coins have a probability of heads of 0.9--90%.Let's assume you and I both flip a coin. When I flipped, I get the following sequence.When you flipped, you get the following sequence.So please write in this box the probability that we flipped coins of the same type.

And the answer is 0.523, 52.3%To see this, first, recognize that these two sequences have the same probability.The probability with a fair coin of this sequence is 0.125,whereas with our loaded coin, the probability is 0.081.So, the probability of either of these sequences of flips is 12.5% if the coin is fair and only 8.1% if the coin is loaded.So, to figure out the probability that the coin is fair, we'll just use Bayes' Rule.Note that the 0.5 simply represents that there's an equal or 50%chance of having coin 1 versus coin 2.And this expression equals 0.6080 and so the probability that we have the same coin is simply the probability that we both have type 1 or fair coins, plus the probability of my having coin 2 times the probability of your having coin 2.And since the probabilities are the same, since the underlying probabilities were the same for each flip,we have 0.608² plus (1-0.608)² and evaluating this expression gives us our answer.

*Note:*

Here's a detailed solution. These are the given probabilities:

What you want is the probability of both coins being of the same type (fair or loaded) given the evidence (flips). Let's start with the case where both coins are fair:

Applying Bayes' rule:

Where:

These are the "normalizers", calculated using the rule of total probability.

Therefore:

Substituting:

So, the probabilities are the same and that's the reason the TA omitted the calculations for the second coin.

*Observation*: The value given in the answer video is actually incorrect, but since it is just an intermediate result, the TA didn't notice his mistake.

Note also that since

and the TA has chosen to put the in evidence and that's the reason is multiplying the denominator in the answer video.These probabilities are independent, so to find the probability of both

and being fair is simply the product:

Let's now calculate the probabilities for the case where both coins are loaded:

Since these are complementary probabilities and they must always add up to one.

These probabilities are also independent, so to find the probability of both MyCoin and YourCoin being loaded is simply the product:

The final answer is therefore:

Now, I'd like to ask you about what happens as the number of flips or events, in general, becomes a very large number.Check all that apply. The probability of every individual sequence becomes small. The probability of every number of heads becomes small.So for example, does the probability of having one head or hundred heads or million heads,any specific number does that becomes small.Does the probability of every given proportion, for example, one head out of 100 flips,10 heads out of 100 flips or 1500 heads out of a billion flips becomes small.Does every given range of proportions have a smaller and smaller probability.Are there some ranges of proportions for which the probability becomes small as we have many flips.Check all that apply. Also, assume that the probability of heads is neither 0 or 1.That is, we can get both heads and tails from this coin.

And the answer is, the probability of every sequence becomes small.This is true because there become more and more sequences.Every time you flip a coin, you double the number of sequences.And they have varying probabilities but they are always more against the probability,always goes down and always multiplying in a number less than 1.That's why this assumption is important.The probability of every number of heads become small.This is a little trickier that this is also true because as the number of flips increases,the number of heads must increase even if we're assuming some low probability for it.So, it becomes farther and farther away from what we expect.The probability of every proportion of heads also becomes small.This is a little trickier but the thing to remember is--every proportion of heads, corresponds to a given exact number of heads.Now that exact number of heads changes as, let's say,I double the number of flips, I also double the number of heads I need. But, that's going to always be a smaller number, since again that probability has to be divided into the chance that I could just have one more head or one less head which is going to be an ever so slightly different proportion.Now, this statement is not true. The probability of every range of proportions can't become small.The range of all proportions is a range and we know that has to stay at one, So, that can't be true and the probability of some ranges of proportions become small.Well, this will also be true--consider something improbable like getting the proportion--a proportion below 10% on a fair coin.That's going to tend to become rare and rare and rare as you flip it more times,because the same rare or unlikely event has to keep repeating itself.

*Note:*

Here's a detailed explanation:

**"The probability of every sequence become small"**✔

All the sequences (outcomes) are equally probable, i.e., . Example, if ( is the number of flips), the possible outcomes (sequences) are { } (two outcomes), therefore the probability of each outcome is . If , the possible outcomes are { } (four outcomes), therefore the probability of each outcome is . The probability of every outcome does become smaller as we increase the number of flips .

**"The probability of every number of heads become small"**✔

As the number of flips increases, so the number of possible outcomes for a given number of Heads . Let's say . If ( is the number of flips), there is only one possibility where the number of Heads is equal to one. If , there are two possibilities. If , the possible outcomes are { }, therefore there are 3 possibilities where the number of Heads is equal to one. Note that the probability of the number of Heads being equal to one is: , so , , , ... The probabilities decrease as increases. For the analysis is simple, but how about greater values of ? The problem boils down to figuring out how many outcomes have number of Heads. You are going to learn this in future units, so don't worry so much about it. You're going to learn that the number of outcomes with Heads is given by the binomial coefficient, , or choose , i.e., the number of ways you can choose elements from a set of elements. Why? Let's say you want to build all possible outcomes with 2 Heads. How could you do this? You could choose a pair of flips from the sequence flips (let's say the 3rd and the 4th flips, for example), make them equal to Heads and make the remaining flips equal to Tails. That makes the number of outcomes with 2 Heads equal to the number of ways you can choose 2 elements from a set of elements or . You can see that this easily extends to , so the number of outcomes where the number of Heads is equal to is . No matter the value of (remembering that the maximum allowed number of Heads is , therefore ), the number of total possible outcomes increases faster with than the number of outcomes where the number of Heads is . That's because the total number of outcomes increases exponentially with the number of flips ( ), while doesn't.

**"The probability of every proportion become small"**✔

Saying that "every proportion become small" is the same as saying that the number of Heads is some fraction of the number of flips ( ), e.g., . If you plot a barchart of the distribution of , you are going to get a bell-shaped plot:

As we increase the value of

, the number of outcomes increases exponentially ( ). This means that the base of the bell becomes wider and wider as we increase the number of flips.Since the probabilities for all possible outcomes have to add up to one, as the number of outcomes increases, the probabilities for each possible number of Heads also decreases. This means that the bell becomes shorter and shorter as we increase the number of flips. Therefore the probability of

decreases with the number of flips. This reasoning justify the decrease for the proportion , but how about proportions that are off centered?Let's take a proportion that is off center of the distribution, for instance,

. Since the number of outcomes increases with the number of flips, greater the number of flips, farther and farther we will get from (middle). Farther we get from the middle, smaller the probability (because of the bell shaped distribution). Also we know that as the number of flips goes to infinity, the mean of the flips gets closer and closer to the coin's probability ( ), therefore as we increase the number of flips, the probabilities get concentrated near and the bell's body gets narrower and narrower, therefore, as we increase the number of flips, the probabilities farther from the center decrease (since the probabilities have to add up to one). For these two reasons, the probability of proportions off center also get smaller as we increase the number of flips. Therefore, the probability of every proportion gets smaller as we increase the number of flips.**"The probability of every range of proportions become small"**✘

That's easy to disprove. If you take the range of proportions that includes all possible proportions, that's also a range of proportions (in this range any proportion is allowed). The probability of this range of proportions is therefore equals to one and will always be one no matter the number of flips. Therefore doesn't increase, remains constant.

**"The probability of some range of proportions become small"**✔

That's also easy to prove. A range of proportions can be a single proportion. We know that the probability of a proportion decreases as we increase the number of flips, therefore the statement is true.

Let's assume we have a fair coin and a loaded coin and the loaded coin has a probability of heads of 0.9.Let's assume that we know that the probability of actually having a loaded coin is zero.In which of these cases is the probability of being fair given the flips <0.5.Check all that apply. Note that 4 H 0 T means four head and zero tails.

*Observation:*

2H3T = , not all the outcomes with 2 Heads and 3 Tails, i.e.:

{ } (10 possible outcomes).

And in none of these cases is the probability of a fair coin given the data less than 0.5.We know this simply because the probability of having a loaded coin at all is zero. So if that doesn't make intuitive sense we can apply Bayes' Rule since P of loaded is zero. P of fair is one and so P of flips given fair is always going to be equal to P of flips because that's the only way you can get the flips. The alternative is zero and so this whole expression will always be one. You could also do this with the reverse in P of loaded and you'd see it would always come out to be zero because this term will be zero.

*Note:*

Being , this means that the coin is Fair. Since the coin fair, P(Coin=Fair|Flips) = 1, i.e., you already know that the coin is fair and that's independent of the flips. P(Coin=Fair|Flips) is always equal to one (no matter what the flips are) and therefore all statements are false.

Now let me ask you if the probability of having a loaded coin is 0.1, what happens to the probability of fair given flips in each of these cases.Check all that have a probability of a fair coin of less than 0.5.

We have three answers--we can quickly rule out these because these should be less likely from a loaded coin that's disproportionately heads since they each have more tails than heads.We can also say that if four heads in a row is sufficient evidence that we have a loaded coin 10 and 20 heads should be two.So let's just evaluate this for the four heads' case.So the probability of having four heads in a row from a loaded coin is going to be 0.6561.Whereas the probability for a fair coin is 0.0625 but remember this just gives us the probability of flips given fair and given loaded.What is we have to multiply each of these by the probability of fair and the probability of loaded and so those are.So we can see here the probability of having a loaded coin times the probability of the flips given the loaded coin is 0.06561 and the probability of having the same flips with a fair coin times its probability is 0.05625.Together they are equal to P of flips.So only one of these divided by that can be greater than 50% so it's going to be the bigger one that's going to be P of loaded, and if P of loaded given flips is greater than 0.5 then P of fair must be less.So this is true which means this is true and this is true.

*Note:*

Here are the calculations for this problem:

Bayes' Rule:

Therefore:

Let's calculate the probabilities required for the calculations:

Total Probability:

Finally the probabilities we need to check the statements:

✔

✔

✔

✘

✘

And now for our programming challenge, this is completely optional and is fairly challenging and requires a decent knowledge of programming but if you're up for a challenge and want to learn to do something fun, let's dive in.We have a bag filled with coins. Each of these coins maybe a bit different.This coin may have a probability of landing on heads of 0.5 and this coin might have a probability of 0.1 Now, I'm going to draw one coin from this bag and I'm going to flip it repeatedly and each time I flip it,I'd like you to tell me what your best guess is for the probability of it being heads on the next flip.To do this, we're going to write a program in Python.What we're going to ask you to write is a Python class called flip predictor that's going to take the set of coins in the bag.In this case, coins with probabilities of landing on heads of 0.5, 0.4, and 0.3 then we're going to flip the coins several times.In this case, the results were heads, heads, tails, and heads.After every flip, we'll tell you what the result was and then we'll ask you to tell us what the probability of heads will be the next time we flip the coin.So to solve this, remember that Bayes' Rule tells us that the probability of the coin selected being a given coin is equal to the probability of the flip being what it was that it's head or tails given coin i times the probability of coin i being the coin divided by the probability of the flip either heads or tails.So now let's look at the code you're going to have to write.So here we are back in the our editor with an outline of the program you're going to have to write.First, we just make division work the way you'd expect it to work that is 1/2=0.5 so you don't have to think about it.Then we defined this thing called a class.If you aren't familiar of what a class is, don't worry, you're not going to actually have to do anything with the class.We set up the class right here for you.A class is just a way of grouping functions and data those functions can use.So here we just set up our data and that is we're given a set of coins in our bag and we just call that self.coins and so in these functions down here you're going to be able to access the coins in your bag as self.coins and that's going to be a Python list.And then we create this new list called self.probs and that's just going to be a list with the same number of items as in coins giving the probability that each point in time of a coin being the coin selected.At first, since the coin was selected at random that probability is simply 1/nso if there are three coins it's a third, if there are 10 coins it's a 10th.So what I'd like you to do is fill in these two functions.P heads is a function that just returns the probability of heads right now.So given your best guess of the current probabilities,what's the probability of coming up with heads?And you can compute this using self.probs and self.coins.Then for the slightly trickier part, I want you to write an update function and that's going to take a result either in H or a T, and based on that result, you're going to update self.probs so that p heads will return with the correct value in the future.Good luck! This is a very challenging problem. I hope you enjoy it.

```
#FlipPredictor
#A coin is drawn at random from a bag of coins of varying probabilities
#Each coin has the same chance of being drawn
#Your class FlipPredictor will be initialized with a list of the probability of
#heads for each coin. This list of probabilities can be accessed as self.coins
#in the functions you must write. The function update will be called after every
#flip to enable you to update your estimate of the probability of each coin being
#the selected coin. The function pheads may be called and any time and will
#return your best estimate of the next flip landing on heads.
from __future__ import division
class FlipPredictor(object):
def __init__(self,coins):
self.coins=coins
n=len(coins)
self.probs=[1/n]*n
def pheads(self):
#Write a function that returns
#the probability of the next flip being heads
def update(self,result):
#Write a function the updates
#the probabilities of flipping each coin
#The code below this line tests your implementation.
#You need not change it
#You may add additional test cases or otherwise modify if desired
def test(coins,flips):
f=FlipPredictor(coins)
guesses=[]
for flip in flips:
f.update(flip)
guesses.append(f.pheads())
return guesses
def maxdiff(l1,l2):
return max([abs(x-y) for x,y in zip(l1,l2)])
testcases=[
(([0.5,0.4,0.3],'HHTH'),[0.4166666666666667, 0.432, 0.42183098591549295, 0.43639398998330553]),
(([0.14,0.32,0.42,0.81,0.21],'HHHTTTHHH'),[0.5255789473684211, 0.6512136991788505, 0.7295055220497553, 0.6187139453483192, 0.4823974597714815, 0.3895729901052968, 0.46081730193074644, 0.5444108434105802, 0.6297110187222278]),
(([0.14,0.32,0.42,0.81,0.21],'TTTHHHHHH'),[0.2907741935483871, 0.25157009005730924, 0.23136284577678012, 0.2766575695593804, 0.3296000585271367, 0.38957299010529806, 0.4608173019307465, 0.5444108434105804, 0.6297110187222278]),
(([0.12,0.45,0.23,0.99,0.35,0.36],'THHTHTTH'),[0.28514285714285714, 0.3378256513026052, 0.380956725493104, 0.3518717367468537, 0.37500429586037076, 0.36528605387582497, 0.3555106542906013, 0.37479179323540324]),
(([0.03,0.32,0.59,0.53,0.55,0.42,0.65],'HHTHTTHTHHT'),[0.528705501618123, 0.5522060353798126, 0.5337142767315369, 0.5521920592821695, 0.5348391689038525, 0.5152373451083692, 0.535385450497415, 0.5168208803156963, 0.5357708613431963, 0.5510509656933194, 0.536055356823069])]
for inputs,output in testcases:
if maxdiff(test(*inputs),output)<0.001:
print 'Correct'
else: print 'Incorrect'
```

And here is my answer, to return the probability of heads given a set of probabilities of heads and a set of probabilities for each coin, we just need to add up the probability of head for each coin multiplied by the probability it could be that coin.To do that, I will just use this nifty little Python feature that lets me walk through these lists together called zip.We'll just take two lists and gives me the first two items, then the next two items, then next two items and so on and I call the probability of a coin coming up heads p coin and the probability of it being that coin p.And so we just need to multiply those two things together and Python has a nice little sum functions which sums up the whole thing and we will just return that.Now the slightly trickier bit is this update piece.First to make my life easier, I'll just store the current value of the probability of heads.We need this because if you recall the denominator of Bayes' Rule is the probability of the result.Now if the result is heads, we want to update our probabilities using the same basic method of walking through these two lists together.We're going to update this by taking the probability for the given coin and multiplying it by the probability that we would have gotten that result that is p coin which is just the entry from coins which is the probability of heads for that coin and since we'd got heads that is the probability of the result given that it's the coin and we just divide it by the total probability of it being heads and if it's tails, we just need to do the reverse.The probability of having tails given that it's a certain coin is just going to be 1 minus the probability with being heads.Again, we have to multiply it by the probability that it's that coin in the first place and then we again just divide by the total probability of it being tails which is just one minus the probability of being heads.