These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!
Please check the wiki guide for some tips on wiki editing.
This unit is all about variance and one of its very close cousins, the standard deviation. Say you are a beginning college student,so your friends might be of the following ages:17, 19, 18, 17, and 19.And you also have 5 close family members of ages 7, 38, 4, 23, and 18.Now for both you can compute the mean.Please enter the mean on the right side.
And in both cases it's 18, as I'm sure you had figured out.
There's something really surprising about these age distributions, which is these ones are clustered very close to 18 whereas these are all over the place really far from 18 in most cases.So can we calculate this somehow?I would say the mode, the median, or the mean don't really capture the spread of the data.What does calculate the spread is called the variance.Now we will compute the variance of the data, and the very first trick you apply is,we normalized these sequences by subtracting the mean from each data item.So in the case of the friends, if we subtract 18 from 17 we get -1, 1, 0, -1, 1.If we subtract 18 from these data sequence, we get -11, 20, -14, 5, and 0.If you compute the mean of these two sequences, then you find the mean in both cases is 0 and that's bigger the work and the reason is by subtracting the mean from every single data item we took it out of the equation and we arrived at 0.But what's now really interesting is, that these numbers are much closer to zero than these numbers over here.So, the spread over here is much larger.Now you might think in computing the variance,you just add up all these values and measure up the spread, but the truth is it doesn't work.If you add them up, they are actually 0. So we have to add up something else.What's commonly added up are the squares of those values. The squares here are 1, 1, 0, 1, 1.Whereas over here, they're 121, 400, 196, 25, 0.For the variance, we sum the squares of all these numbers and normalize that.But the squares are not taken of the original data but taken off the original data minus the mean and we're going to write that as Greek letter over here, µ.Do me a favor and try to compute the variance for the first sequence,which you obtained by adding those squares over here and dividing them by 5.
The answer is 0.8, the reason being that the sum of those is 4/5 is 0.8.
Let's do the same for the second sequence. Add them up and divide them by 5.
Now the answer is very different. It's 148.4.And that's interesting.
The variance is a measure of how far the data is spread. It's really small if the data is centered around the mean,and it's really large if the data falls far away from the mean.Here are two trees, and both of these trees have a couple of apples Of course, the apples at some point fall down.When they're on the ground, then all of them are centered equally around the stem of the tree,so if you computed the mean you would get back exactly where the stem of the tree is.But on the wide tree, the spread is much larger,whereas for the small tree, the spread is much smaller.You can think of variance as a measure of how wide the tree is from which the apples fell down.
So coming back to our example, here are the variances again. Let me give you another word that's really important called standard deviation.The variance by its very nature computes in the quadratic.It's the quadratic deviation from the mean.In fact, it's the average quadratic deviation from the mean.If you don't want something quadratic, then you can take the square root of it,and you get what's called the standard deviation.In the first case √0.8 is 0.8944, and second case the √148.4 is about 12.18.This is how much you expect the age of an individual family member to deviate from the mean by about 12 years.Whereas here you expect the deviation to be just below 1 year.You can see why this is a good approximation.Here 4 of the 5 kids differ in age by 1 yr whereas here people differ as much as 20 years in age from the mean.The variance and the standard deviation are much larger.Let's practice this.I'll give you a couple of quizzes now.I'll give you data. I want you to complete the mean for me, the variance and the standard deviation.3, 4, 5, 6, 7--please fill in all three of those values.
Obviously, the mean is 5.When you sum up the squares, here the difference is -2² would be 4.It'd be 1 over here--0, 1, 4. You get 10, but you're going to compute the average distance, which is 10/5--that's 2.The square root of this is about 1.414.
8, 9, 10, 11, 12--what's the mean, the variance, and the standard deviation?
The mean is now 10, and the variance and standard deviation stay the same. The reason being once you subtract the mean from the sequence, it's exactly the same as this data sequence over here.
15, 20, 25, 30, and 35--what's the mean, variance, and standard deviation.
The mean is obviously 25.In fact, every element over here is 5 times as large as any element over here.Now, how does this affect the variance?Well, when we compute the differences from the mean, they're going to be 5 times as large.When we square them, they will be 25 times as large.As a result, the variance is 25 times as large as over here--it's 50.And we can quickly check this.Obviously, there is no difference for 25 in the middle. 20 and 25 are 5 apart--the square of this is 25.Same for 30. 15 and 25 are 10 apart. The square is 100.We add those up--we get 250--and now we divide by 5 to get back our 50.The standard deviation is 7.071,and that's exactly 5 times as large as the standard deviation over here.If this is 25 times as large as this one,then the square root of it will be just 5 times as large as this one.You can easily verify that this is the square root of 50.
Here is an interesting data sequence--3, 3, 3, 3, 3.What is the mean, and what's the variance and the standard deviation?
The mean is obviously 3, the variance will be 0, and the standard deviation will be 0. The reason is when you subtract 3 from the sequence you have 0s left.The squares of those are 0s, the sum of those is 0.If you then complete the variance, it's 0.The square root of that is 0 as well, which means there is no spread at all in the data.
In particular, consider a single data item as our sequence.Just 4--what do you think is the mean, the variance, and the standard deviation?
The mean is 4, and the variance and the standard deviation are 0. For any single data item, you won't find variance or standard deviation not be different from 0.You got it?
So, if your data falls onto, say, a curve like this,and this here is your mean, then you can think of the spread--this one over here--as your standard deviation. The formula that gives you this as the mean is the now-familiar formula of the sum of the data points divided by the size of the data set.The variance--often written σ² is the sum of our data points normalized by the mean to the square.And then from that it follows that the standard deviation is just the square root of σ².These are the formulas that you just applied, and I hope you recognize them.
Here are our equations again for the mean and the variance.They should now look very familiar. The problem with that is they require two passes through the data.First, I have to go through all the data and compute the mean.I do this by summing up all the data and dividing it by the total number of data items.For that, I maintain two things.I maintain the total number of data items, which I increment every time I see a new item.I maintain the sum of all Xᵢs, which I can easily add up as I go through the data.Once I've done this, I know µ, and then I can finally plug it in here,but now I have to go through the data again and compute this guy,so we can finally get to my covariance. Now, I want to teach you a trick now for which we don't maintain this guy but instead maintain just this guy.The nice thing about this is if you maintain these three things you only need a single pass.To see, as we go through the data, we can maintain the number of data items that we have.We can maintain the sum over here.We can maintain the sum of the data themselves.You can maintain simultaneously the sum of the square of the data items,but it's not obvious that from these three things alone you can figure out what σ² is.Let's play a game.The way the game works is that I put a number of mathematical expressions here on the right side.I'll start the derivation and leave little gaps and ask you to pick one of those and plug it into the equation.You can only use each of those ones but you won't use all of them, just a subset.Let me do the first step by hand and rewrite σ² by factoring out the square.I can rewrite this by multiplying things out.So, help me. What do you think goes in the middle?Which of those expressions?
The correct will be this one.And this is the correct one, -2Xᵢµ, which is the mixed term in this multiplication, so let me just take it and plug it in right here,and we'll remove it from the list on the right side.
Now, I break these sums into individual sums--a sum for this guy, a sum for this guy, and a sum for this guy. That gives me 1/N Σ Xᵢ² minus something, and you pick on the right side,times Σ Xᵢ.What do you think it will be when I break out this term over here--and individual component in the equation down here.Just look at this guy here.
The answer is you have to obey the 1/N, the 2, and the µ,so it's 2µ/N. This is the one. Let me remove it.
And now let's look at the final one. Which one of these then fits the right side over here?
And it's µ²--this one being the 1/N over here but there's N of µ² and the Σ.So the Nµ² inside the Σ divided by N gives us µ² because we moved this guy over here.
The instructor omitted a step here and if your algebra is rusty, you might be lost. Here's a more detailed derivation of this part:
You can break the summation into three separated summations as follows:
is the same as (You're adding the mean squared times). Therefore:
Rewriting this gives me an expression like this. I'm leaving an exactly one expression over here.Can you find it? This one require some thought. So you have to be really careful.
If we look what's missing here, it's 1/NΣXi. Have we known what this is--this is the mean. So what's missing is µ. List that µ and move it over here.
Minus 1/NÂ˛ times another expression you find on the right side.
And the expression here is this guy. This is the square of µ minus the numerization factor which I put over here.
So the final formula gives us the following, =1/NΣXi²-1/N²(ΣXi)². And noticed we didn't use any of these elements over here so let me just take them off.Notice how the calculation of σ² which we're going to rewrite a little bit as follows.This obviously the same. Noticed how this calculation doesn't rely on this term here anymore.It solely uses the terms over here. In particular, we have the ΣXi², we have N.N² is easily calculated from N, and we have ΣXi²--sum them up and when they all summed up,we find the square of the sum of it--so here the square is outside and here the square is inside as we sum the elements.So the counter statistics of ΣXi, ΣXi² and N are sufficient to compute the variance and the single pass through the data, let's just calculate the variance.
Here's the complete derivation, all steps together in one place:
Le'ts expand the squared term inside the summation:
That's a summation inside a summation, so we can break the result of the expansion as follows:
The termsand don't depend on and therefore can be factored out of the summation (in the same fashion we did for ):
is the definition of
. That's summed times. Therefore:
Here's our formulas again from µ² and σ², and for the data sequence, 3, 4, 5, 6, 7, give me the three statistics that we need to compute it in these boxes here.
Although the N is 5, the Σ is 25 and the Σ² is 135, this time we'll not set tracking of the mean as we did before. So it's 9 plus 16 all the way plus 49.
We will now plot this into the formulas above. Well, giddy-up.
And the answer is the mean is 5 as before. It's 25/5=5. σ² is a bit more difficult to compute.135/N, which is 5, that happens to 27 - 1/N² which is 25 of this expression X².That's 25²--625 and of course, 25² divided by 25 is 25. The difference between 27 and 25 is 2.It's the same variance we had before, just computed a little bit differently.
So, wow, you now understand a lot.You understand the variance which is the spread of the data squared.You understand standard deviation which is the same without the square,and the other way to compute all of those in a single pass to the data using only these running counters.Let's try it profound. You're really deep into statistics now.So, let me ask you two tricky questions.This is a secret, please don't tell anybody, but I'm actually considering, at least hypothetically,to give everybody a raise, and we're lucky to ask is what the effect of the raise ison the mean and standard deviation of the distribution of salaries in the company.I'm considering two types of raises, a fixed amount of $1000 and what's called of relative raise of 20%.Now, that changes the mean and/or the standard deviation to potentially different values they call µ’ and σ’ and the changes either multiplicative,which would be the factor over here, or additive.If there is no multiplicative change, just put one over here.Can you guess what the effect is of adding $1000 to your salary on the mean and the standard deviation?
The mean shift upwards by $1000 and we add this to the salary, so we put a $1000 over here and 1 over here because it's the old mean plus $1000.For the standard deviation, that's interesting. Let's look at the variance, which is defined as follows.And now let's add for the new variance, a $1000 to each individual salary and a $1000 to the mean.As you can see, these $1000 cancels them out and you have the formula for the old variance.Hence, the old standard deviation, which means you put 1 over here and 0 over here.
Let's do the same for the relative raise we're trying to present andsee how that effects the mean and the standard deviation. Please give your best guesses over here.
And obviously the new mean is 1.2 times the old mean because we're multiplying everything up by 20% raise that makes a factor of 1.2 over here and zero over here because we don't really add a constant amount of money per person its variable.It's a multiplicative change. For the variance it's more interesting.In our new variance, we multiply each salary by a factor of 1.2 and the same is through for the mean.When we look at this, we can bring the 1.2 outside these brackets and it becomes (1.2)².The reason why I squared this, we can bring it outside Xi and u but there's a square over here and we can move it outside this sum as a factor.Now this is true for the variance but not the standard deviation.The standard deviation is cleared of the variance.The standard deviation was up by factor of 1.2 only, with a constant offset of zero.
I finally want to teach you about the concept of a standard score. The basic idea is that for any Gaussian no matter what the mean and the covariance is,you can state how far inner out a point axis, so let me give you an example.Suppose this is a point x and now I will give you a different Gaussian that's much wider with a different mean and the different center of deviation.If I ask you about the right Gaussian with the corresponding point x in the right Gaussian,tell me is it here, here, or here?And the answer you probably picked is this one over here.Even though in total distance this guy is about as far as this guy over here because its Gaussian is wider.This seems to be the more logic of point in relative to the mean and the variance of this Gaussian.This point corresponds pretty much to this point over here.Now that's interesting. That's called the standard score.So given the point x, you subtract the mean and you divide it by the variance and it gets me this standard score.Got it. Let's see. Here's a data set 3, 4, 5, 7, and 6. The mean is 5, the standard deviation is √2.If I now ask you for the number 2 for this specific x, what do you think is the standard score relative to the Gaussian that fits those data points over here.
And the answer is -2.12 in approximation. The way we get this is applying this formula here--2-5/√2.And when you punch this into a calculator, this is the number you get.
Let's do this again. Now, I gave you 5 estimated data point x. What do you think is z--the standard score.
And this was an easy one--the answer is 0.5 data point minus 5 mean gives 0, divided by anything else, will remain 0.
We talked about standard deviation, variance, and standard score. I guess this makes this lesson somewhat standard. Never mind. Let's move to the next unit.