These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!
Please check the wiki guide for some tips on wiki editing.
In this unit, I want to show you a problem that will illustrate how deep statistics can actually be.Statistics is not just a superficial field.In fact, in this unit I will show you a problem that will blow your mind.I promise you will think about this for a long time to come.So, let's just dive in.
The problem I'd like to tell you about is motivated by an actual study the University of California Berkeley,which many years back wanted to know whether it's admissions procedure is gender biased.I looked at various admission statistics to understand whether than admissions policies had a preference for a certain gender.And while the numbers I'll be giving you are not the exact same that UC Berkeley found,the paradox is indeed the same and is often called, "Simpson's Paradox."I'm just giving you a simplified version of the problem.Here is the data.Among male students, we find that from 900 applicants in major A 450 are admitted.Please tell me what the acceptance rate is in percent.
Obviously, it's 50%.
In a second major B 100 students applied, of which 10 were admitted.What is the acceptance rate?
And the answer is 10%.
The same statistic was run for female students.Again, I made up the data to illustrate the effect.Females tended to apply predominantly for major B with 900 applications for major Band just 100 for major A.The university accepted 80 out of 100 applications in major A and 180 out of 900 in major B.Please tell me the rate of acceptance in percent for major A for the females student population.
Of course it's 80%--80/100.
Please do the same for the major B in the female population over here.
180 is 20% of 900.
So, just looking at these numbers for the two different majors, would we believe--in terms of the acceptance rate--is there a gender bias? Yes or no?
And I would say yes, in part becausethe acceptance rate is so different for the different student populations,even though the numbers are relatively large.So, it doesn't seem just like random deviations.But the thing that will blow your mind away is a different question.
Who is being favored--the male students or the female students? And looking at the data alone, it makes sense to say the female students are favored because for both majors,they have a better admission rate than the corresponding male students.But now, let's do the trick.Let's look at the admission statistics independent of the major.So, let's talk about both majors, and I would wonder how many male students applied.And of course, the answer is 1000.How many were admitted?
And the answer is 460.
So, what is the admissions rate for male students across both majors in percent?
And the answer is, of course, 46%.It's 460/1000 x 100%.
Now, do the same for the female student population, maybe at 1000 applicants, same number as in the male case, and 260 students admitted.So, what's the percentage rate for admission?
The answer is 26%.
So, across both majors, I'm asking you the same question again now.Who is actually being favored?Males or females?
And surprisingly, when you look at both majors together,you find that males have a much higher admissions rate than females.I'm not making this up.These numbers might be fake, but that specific affect was observed at the University of California at Berkley many years ago.But when you look at majors individually,then you find in each major individually the acceptance rate for females trumps that of males, both in the first major and the second major.Going from the individual major statistics to the total statistics,we haven't added anything.We just regrouped the data.So how come, when you do this,what looks like an admissions bias in favor of females switches into admissions bias in favor of males?
I showed you this example to illustrate how ambiguous statistics really is.In choosing how to graph your data,you can majorly impact what people believe to be the case.In fact, a famous saying goes,"I never believe in statistics I didn't doctor myself."I'll let you guess here who this is being attributed to--Mark Twain, Oscar Wilde, or Winston Churchill.Check the Web, and see who invented this famous quote.
And even though I don't think Winston Churchill invented it,in the course of World War II the Germans tried to associate this quote with him to make him less credible.Be it as it is, the key lesson here is statistics is deep and often manipulated.One of the tricks I'd like to teach you is to be skeptical of statistics,of your own results, of other people's results,and really understand how to turn raw data into decisions or conclusions.I hope that this simple example made up think.Stay tuned when we dive into the basics of statistics,which is probability theory, in the next number of units.