st101 » week-5 »

28.  Programming Tests and Intervals (Optional)


These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!
Please check the wiki guide for some tips on wiki editing.

Contents

01 Confidence Intervals

Selection_002.png

Here's our optional unit. It involves programming and what I want is really simple. I want to input a sample and a hypothesis and out should come a simple yes or no whether whether I should accept it.This piece of code should just do it,and for simplicity I assume 95% confidence and two-sided-tests.Here's what I'll do.  I'm going to give you functions you already know. You've already programmed with the mean function. Here it is again.Notice the use of the word float, in case the list doesn't have a float happens to be an artifact of type conversions in Python, but never mind.Then we have a variance function that you programmed before,and here's the variance function in a very compact way.And then, I also should really implement the T table but I was lazy so I'll just give 1.96,even though we know that's not quite correct.I'm going to give you a list. This was the list of height in my basketball club and I print the mean.That is hard to see but I should indeed get 201.5. Give it a try.What I want you to implement now is the function conf,which is the plus/minus term in the confidence interval below and above the mean.If I run it for this data sequence over here from 199 to 204, I indeed get 1.366544.So back in your code, we left the function conf open and please insert your code right here.You can do this in one line.

#Write 
from math import sqrt

def mean(l):
    return float(sum(l))/len(l)

def var(l):
    m = mean(l)
    return sum([(x-m)**2 for x in l])/len(l)

def factor(l):
    return 1.96

def conf(l):
    #Insert your code here

l = [21]*4+[24]*6+[26]*7+[29]*11+[40]*2
print conf(l)

02 Confidence Intervals Solution

Here's my answer--we extract the factor and then just take the square root of the variance divided by the length of the list.This is exactly the formula I gave you. When you run this you get the desired answer.Finally, I will add the hypothesis for example h equals 200.

#Write 
from math import sqrt

def mean(l):
    return float(sum(l))/len(l)

def var(l):
    m = mean(l)
    return sum([(x-m)**2 for x in l])/len(l)

def factor(l):
    return 1.96

def conf(l):
    #Insert your code here
    return factor(l)*sqrt(var(l)/len(l))

l = [21]*4+[24]*6+[26]*7+[29]*11+[40]*2
print conf(l)

03 Hypothesis Test

I want you to write a function called test that accepts as input a sample land hypothesis h and it returns true if we believe the null hypothesis in a two-sided test.Otherwise, it returns false. Insert your code right here.

#Complete the test function to perform a hypothesis test 
#on list l under the null that the mean is h

from math import sqrt

def mean(l):
    return float(sum(l))/len(l)

def var(l):
    m = mean(l)
    return sum([(x-m)**2 for x in l])/len(l)

def factor(l):
    return 1.96


def conf(l):
    return factor(l) * sqrt(var(l) / len(l))


def test(l, h):
    #Insert your code here

l = [21]*4+[24]*6+[26]*7+[29]*11+[40]*2
print test(l, 26)

04 Hypothesis Test Solution

Here is my solution. Test is a function of the data l and the hypothesis h. We first compute the mean and then the plus/minus term of the confidence intervals,and if our actual hypothesis differs from the mean by less than the confidence interval size,we return true, and that is a two-sided test,hence the absolute involving the difference between h and m.If we look at this, this logical expression at the bottom will only trigger,if h deviates way too much more than c from m, and so it goes.Down here is the output of our example from class. 201.5 was the mean.1.366 was the size of the interval, and we got a false.We don't believe h zero.If I change my hypothesis to 201 and one will test again, we now get a true as a result of this test.Great. You programmed confidence intervals and you programmed a hypothesis test.That is actually really cool. If you programmed it, I bet you now really know how it works.

#Complete the test function to perform a hypothesis test 
#on list l under the null that the mean is h

from math import sqrt

def mean(l):
    return float(sum(l))/len(l)

def var(l):
    m = mean(l)
    return sum([(x-m)**2 for x in l])/len(l)

def factor(l):
    return 1.96


def conf(l):
    return factor(l) * sqrt(var(l) / len(l))


def test(l, h):
    #Insert your code here
    m = mean(l)
    c = conf(l)
    return abs(m-h) <= c


l = [21]*4+[24]*6+[26]*7+[29]*11+[40]*2
print test(l, 26)