st101 » week1 »

1. Teaser

These are draft notes extracted from subtitles.  Feel free to improve them.  Contributions are most welcome.  Thank you!
Please check the wiki guide for some tips on wiki editing.


01 Welcome

Welcome to Statistics 101, taught by me, Sebastian Thrun and by Adam Sherwin,who is out assistant instructor doing a lot of work in the background.  I figure I'll start with giving you a teaser, a challenging teaser.  I'm going to provoke you.  So, this is you.  I believe you should be unhappy, not because our class is so bad, but you and I will prove in a second that you are unpopular.  The reason why I show this is to show you how deep statistics is and how we can easily fool ourselves.  Let's dive in.

You're unpopular and not good-looking, but still I think you're beautiful inside.  Despite your big nose and googly eyes.

02 Average Friends

Let's say there are two types of people for simplicity—type A and type B.Type A are the popular ones.They have 80 friends.And type B are less popular. They only have 20 friends.You might now say that I don't know which type you are.I will compute what's called the expected or average number of friends.To do so, I assume that half of the people are of type Aand half of the people are of type B.Here's your very first quiz in this class.Into this box enter what you think is the expected or average number of friendsif you have a 50% chance of being a type A and a 50% chance of being a type B.Type in here when you're done.There is a submit button somewhere down here.You can see if you're answer is correct.Regardless of whether you finish this or not,at some point just hit the next button,and you'll see my answer.


03 Average Friends Solution

My answer is 50 friends.The way I get there is I don't know what type you arewith a 50% chance or 1/2 you're type A, in which case you have 80 friends,and a 50% chance you're type B and have 20 friends.That gives me this equation over here as the expected number of friends.Working this out means you have 40 + 10 = 50 friends.So far, so good, but why are you unpopular?

Right now may be the case that you don't know anything about probabilities or what "expected number of..." is, so here goes a different way to look at it. Suppose our social network has n people. Some big number, perhaps 500 thousand or 1 million people. "50% probability of being Type A" is equivalent to say that half of the people in our social network or \frac{n}{2} is of Type A. "50% probability of being Type B" is equivalent to say that half of the people in our social network or \frac{n}{2} is of Type B. You likely know how to average things. Let's say that you play cards once a month with your friends and that in the last three times you have won $10, $15 and $5. You may say that in average you have won \frac{10+5+15}{3}= $10 a month in the past three months. So, going back to our original problem, what would be the average number of friends per person in the social network? You have \frac{n}{2} people with 80 friends or \frac{n}{2}\times80=40\times n. You have \frac{n}{2} people with 20 friends or \frac{n}{2}\times20=10\times n Therefore, the average number of friends per person would be \frac{40\times n+10\times n}{n}=50.

04 Expected Friend Type

Here is your Facebook or G+ page, and of course you're smiling.On it is your list of friends, and we already know it's either 80 or 20 friends.In expectation it's 50 friends.Let's pick a random one of your friends, like this one.This person will also have a Facebook or a G+ page.Before I raise the question how many friends this person has,let's consider that this might be either a type A or a type B person.Keep in mind that type A have 80 friends and type B have 20 friends.The question I have for you is what are the chances you picked a type A friend?This should be a number between 0 and 1.I'm also going to ask you for the opposite.What are the chances you picked a type B?Please enter both numbers, and these numbers should sum up to 1.Submit and then next.I should warn you this is a challenging question.If you don't get this right, don't worry.This is the type of stuff you'll know when you've taken the class.I just want to tease you a little bit in the beginning.


05 Expected Friend Type Solution

Here's the interesting finding.Because type As are so much more popular,your chances of linking to a type A is 0.8.To see, let's take the extreme view.Suppose type Bs had 0 friends.Then you'd never link to a type B.You link to type A and to type B in a proportion of 80 to 20.Type B would be 0.2.That means most of your friends you link to are type A.They're the type of people that happen to be popular.

Note: We are given the information that:

  • Type A friend has 80 friends.
  • Type B friend has 20 friends.

This means that if you pick any person from this social network, this person either has exactly 80 friends or exactly 20 friends.

Let's say our social network has $n$n people, some large number, perhaps 500 thousand or 1 million people... Half of these people, or \frac{n}{2}, are of Type A and half of Type B (That has been defined by the instructor earlier). We may separate these people in two groups:

  • Group A: All the people of Type A. They have together \frac{n}{2}\times80 , or 40\times n, connections, since you have \frac{n}{2} people of Type A and each one has 80 friends.

  • Group B: All the people of Type B. They have together \frac{n}{2}\times20 , or 10\times n, connections, since you have \frac{n}{2} people of Type B and each one has 20 friends.

That's a total of (40\times n+10\times n) or 50\times n connections in our social network. Note that Group A has 4 times more connections than Group B, therefore it's 4 times more likely that a person from Group A is connected to you (in your list of friends) than a person from Group B. In resume, a person from Group A has more chances of being connected to you than a person from Group B, because Group A has more connections.

A simple intuition: Let's say you and your friend are playing the lottery. You have 1 lottery ticket and your friend has 10. Your friend is 10 times more likely to be the winner than you (although his chances of winning are very thin, since so many people play).

So there is a chance of (40\times n) in (50\times n) that a person of Type A is connected to you:

P(\text{Type A})=\frac{40\times n}{50\times n}=80\%=0.8

And there is a chance of (10\times n) in (50\times n) that a person of Type B is connected to you:

P(\text{Type B})=\frac{10\times n}{50\times n}=20\%=0.2

06 Unpopular

Let's now go back and ask the real question.In expectation, how many friends does this friend of yours have?Please put your number right here,and again, it's a challenging question.Don't get disturbed if you don't know the answer.This type of stuff we'll study in and out.


07 Unpopular Solution

Here is my surprising answer.It is 68 friends.Who would have thought?The way to get there is with 0.8 chance you'll pick a type A who has 80 friendsand with 0.2 chance you'll pick a type B who has 20 friends.If you work this out this is 64 + 4, makes 68.Your friend in expectation wold have 68 friends where you would only have 50 in expectation.So, sorry, I think you are unpopular in expectation.

You don't know if you're Type A or Type B, therefore you don't know how many friends you have. Let's say that you have k friends then (which could be either 80 or 20 friends). What you do know is that 80\% of your friends are Type A and 20\% of your friends are Type B. So you know that \frac{80}{100}\times k=\frac{4\times k}{5} of your friends have 80 friends or \frac{4\times k}{5}\times80=64\times k friends. You also know that \frac{20}{100}\times k=\frac{k}{5} of your friends have 20 friends or \frac{k}{5} \times 20 = 4 \times k friends.
That's a total of 64 \times k + 4 \times k = 68\times k friends. Therefore, in average, your friends have \frac{68\times k}{k}=68 friends. So, what's the insight you get from this work? You have determined in the beginning that you should expect to have in average 50 friends. But now you determined that each one of your friends have in average 68 friends... And that's why you are unpopular (i.e., you should expect to have in average less connections than your friends).

08 Course Overview

Let's talk about this class.Most of the material is very basic.It's the first class you would have in college if you're not a statistics major.We teach you things like how to visualize data, how to summarize it,how to run test, and even find trends.But there are also a few nuggets in there that are challenges.These are optional, and they're clearly marked as optional,but I will let you prove some theorems along the way using little games that I play with you.Most important, you'll be afforded the possibility to program the things you've learned.Again, this is optional, because I don't expect you to have a programming background.But give it a try.I believe that through programming you learn the material much better than any other way.It's optional. You can assimilate all the material without it.But I would recommend to give it a try if you know how to program.