st101 » week-4 »

19.  Central Limit Theorem Programming (Optional)


These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!
Please check the wiki guide for some tips on wiki editing.

Contents

01 Programmng Flips

You had asked why did I drag you through all of these binomial distribution stuff and flipped so many coins, the reason is you're going to move now towards what's perhaps the most deep insight in all of statistics.It's called the central limit theorem.And the way I want you to get there is through a programming exercise.Now, I told you that all the programming is optional and you can totally skip this one but I beg you to stay with me.What you're about to see is perhaps the most interesting way to understand the central limit theorem and statistics of large numbers.In assignment #1, I literally want you to flip a coin 1000 times.And once you've done this, I want you to compute the mean of the outcome and the standard deviation.Flipping a coin is a random event. It gives us things like 0 or 1s as outcomes.Like this thing over here.If we were to do this 1000 for a fair coin, you expect the outcome of the mean to be 0.5.You probably have no clue what to expect for the standard deviation.There's a couple of things I want to give you.In your programming environment, you'll find the function mean, the function variance,and the function standard deviation as you've practiced it before.And what if I did a little bit to make sure that whatever you do is of type float what you need for computing the mean.Otherwise, it might be of type integer and then these calculations all go wrong.Same over here in the computation of the variance but ignore the float type conversion.Other than that, it's exactly what I've shown you before.I now want you to implement the function flip that takes this as an argument the number of coin flips you want to do, 1000 in this case,and then use as a function mean and then standard deviation to compute the mean and the standard deviation of the resulting sequence of outcomes.This will be a list filled with 0s or 1s.The thing to know is that with the function random.random,there's two of them with a dot in the middle, gives you a random value that sits between 0 and 1.Every time you call this function, you get a different random value,which is nice because you just have to call this 1000 times to get you 1000 samples.But then in the interval of 0 and 1 and you want to put them back into coin flips,so what you have to do is to call this expression here.And this expression over here will give you true or false,which is the same for the purpose here of 1 and 0 .It gives you true if the random value happens to be larger than 0.5 and false if it's smaller.So the assignment is to call this thing 1000 times and make a list of these 1000 outcomes and put that code in and the function flip that returns that list and then you're done.So here's a typical outcome for this code. If I run it, the mean might be 0.484.There's the standard deviation.If I run it again, I get a different mean of 0.51 and a different standard deviation.

#Write a function flip that simulates flipping n fair coins. 
#It should return a list representing the result of each flip as a 1 or 0
#To generate randomness, you can use the function random.random() to get
#a number between 0 or 1. Checking if it's less than 0.5 can help your 
#transform it to be 0 or 1

import random
from math import sqrt

def mean(data):
    return float(sum(data))/len(data)

def variance(data):
    mu=mean(data)
    return sum([(x-mu)**2 for x in data])/len(data)

def stddev(data):
    return sqrt(variance(data))


def flip(N):
    #Insert your code here

N=1000
f=flip(N)

print mean(f)
print stddev(f)

02 Programmng Flips Solution

And here's my answer. It's a one-liner.A bid in an array of 1000 things, and this is the beauty of Python.There's ways to make it more complicated as it is before in the variance case.Now this is a little bit more compact. So I ran the test random.random larger than 0.5.And this thing over here gives me the true or false and I want to do this 1000 times.And doing this 1000 times invokes this command for x in range (N) where N is 1000 and range N becomes a list of 0 to 999.This will go 1000 times of different x's.The x's that we've used here because the random coin flip doesn't understand what the order of the coin flip is.They're the same every single time but this just means I ran this procedure over here 1000 times,collected the results in the bracketed list, and returned it.Specifically, if we were to print out f and hit the run button then what I get is the stuff down here.A list of 1000 items of false, true, and false. It makes for a beautiful wallpaper, doesn't it.

#Write a function flip that simulates flipping n fair coins. 
#It should return a list representing the result of each flip as a 1 or 0
#To generate randomness, you can use the function random.random() to get
#a number between 0 or 1. Checking if it's less than 0.5 can help your 
#transform it to be 0 or 1

import random
from math import sqrt

def mean(data):
    return float(sum(data))/len(data)

def variance(data):
    mu=mean(data)
    return sum([(x-mu)**2 for x in data])/len(data)

def stddev(data):
    return sqrt(variance(data))


def flip(N):
    #Insert your code here
    return [random.random() > 0.5 for x in range(N)]

N=1000
f=flip(N)

print mean(f)
print stddev(f)

03 Sets Of Flips

Now with this in place, now here comes the really interesting question.  It’s assignment number two. And again it’s a programming assignment – free to skip.Now that we have a function of flip, it gives me this list of a thousand outcomes from which I cannot derive things like the mean. Run this thing itself a thousand times and each time you get a different mean, so this means zero, mean one and so on all the way to a mean nine, nine,nine.And these means are continuous values obviously, between zero and one and give you the same function as before, mean, variance,standard deviation, and flip and as I scroll down, I find this function sample, I want you to put in code over here so that when I sample with the same n, I run the flip experiment a thousand times and every single time I compute the mean and now I assemble a list of all the means into this thing called outcomes.The means will be continuous, I can do a history plot, it’ll be better with many bins, so this notation over here gives me 30 bins. And to give you a feel for what to expect, this isa typical histogram I get out as a result.It’s really beautiful. If I increase n to 2000,I get this histogram over here. Apologize some numbers are a little illegible over here but the center of it is 0.5 and it falls off to smaller number to the left, to the right. You can think of it as a distribution over the means outcomes of large numbers of coin flip sand has an interesting shape. So go ahead and program it and see if you can reproduce these results.

#Write a function sample that simulates N sets of coin flips and
#returns a list of the proportion of heads in each set of N flips
#It may help to use the flip and mean functions that you wrote before

import random
from math import sqrt
from plotting import *

def mean(data):
    return float(sum(data))/len(data)

def variance(data):
    mu=mean(data)
    return sum([(x-mu)**2 for x in data])/len(data)

def stddev(data):
    return sqrt(variance(data))


def flip(N):
    return [random.random()>0.5 for x in range(N)]

def sample(N):
    #Insert your code here

N=1000
outcomes=sample(N)
histplot(outcomes,nbins=30)

print mean(outcomes)
print stddev(outcomes)

04 Sets Of Flips Solution

And my answer is quite simple--again, it runs my experiment a 1000 times using the for x in range it summons a new list and this on the list is the mean of the list produced by flip.Flip itself every single time I run it will give me a 1000 0s or 1s or truths or false and using the function mean, I compute the mean of that.But now I have an outer loop where I do this calculation of the mean a 1000 times.It summed them into a new list and that's my new list and that list is continuous valued.These are printed out by saying print outcomes after generating it.What I see is a list of 1000 numbers--it all have around some by 0.5,some of it is 0.046, some is 0.515.These are the empirical means for these sequences of a 1000 coin flips.See how it flip effectively 1,000,000 coins here, and this is the corresponding histogram plotting the frequency of the coins.[#000000]

#Write a function sample that simulates N sets of coin flips and
#returns a list of the proportion of heads in each set of N flips
#It may help to use the flip and mean functions that you wrote before

import random
from math import sqrt
from plotting import *

def mean(data):
    return float(sum(data))/len(data)

def variance(data):
    mu=mean(data)
    return sum([(x-mu)**2 for x in data])/len(data)

def stddev(data):
    return sqrt(variance(data))


def flip(N):
    return [random.random()>0.5 for x in range(N)]

def sample(N):
    #Insert your code here
    return [mean(flip(N)) for x in range(N)]

N=1000
outcomes=sample(N)
histplot(outcomes,nbins=30)

print mean(outcomes)
print stddev(outcomes)

05 Bell Curve

The thing that is suspicious is that in this binomial distribution, it seems that the frequency of outcomes is centered around the expected outcome of 0.5 that falls off according to a funny looking curve--often called a bell curve and the reason is this could be the world's largest church bell.The significance of this bell curves in the relationship to what's called the central limit theorem will be discussed in the next unit.

Selection_004.png