These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!
Please check the wiki guide for some tips on wiki editing.
Let's now talk about pie charts. You all know what a pie is. Here's a birthday pie, and if you look at this pie from above it looks just like this. It's a circle. Now here comes Sebastian and cuts out his first piece of the pie. What results is a pie with a missing piece, and then my wife comes, and she gets just a small piece, but my brother eats a very big piece, so only the following pie is left. We've just made a pie chart, and in statistics you use pie charts to visualize data, specifically relative data, and I'll tell you in a second what that means.
Let's start with an exercise. Suppose we are in an election, and there are two parties--party A and party B. And suppose it's a toss up, so both parties are getting the same number of votes--or 50%. Here are three pie charts. Will any of those reflect the outcome of the election? Perhaps the first, the second, or the third, or none of the above. Please check exactly one box.
The answer is the second is actually a good representation of this outcome. When I color the pieces of the pie, we'll see that 50% of the pie falls into one class and the other 50% in the other class. Compare this to this pie over here, which seems to be a 75 to 25 split, or the one over here, which is 25 to 75. This is the one I would've chosen.
Now, I said that pie charts are good for relative data. To illustrate this, suppose party A go 724,000 votes and party B got 181,000 votes. What is the percentage of votes that part A got? What's the percentage of votes for party B? Enter your numbers here.
Well, with a little bit of math, we realize that this is 80%. The way to calculate this is 724,000 is the votes that party A received, but the total number of votes was 724,000 plus 181,000, which is 724,000 over 905,000. That's exactly 0.8, which is the same as 80%. It follows that party B got 20%. You get to that number when you replace 724,000 with 181,000--the number of party B. It turns out 181,000 divided by 905,000 is exactly 0.2.This is exactly 5 times as large as this number over here.
Now, given all this I will now draw a number of pie charts. I want you to select the one that most closely resembles the distribution over here. Just for clarity, party A is depicted in red and party B in blue.Please select exactly one of those pie charts.
To me the best answer is the last one. The reason being that this area of the pie most closely resembles 20% of the pie. Now, closely related is the first one where we cut out a quarter of the pie. But a quarter is 25%. So this is slightly smaller than a quarter. I think it's the best one to correspond to 20/80 in this pie chart.
Here comes a tricky question. Given that we have a pie chart with distribution 80% and 20%, I'm now changing the total number of voter. I'm telling you there were 23,000 people voting for party B, and I'm asking you how many voted for party A such that the pie chart over here is exactly the correct one with an 80 to 20 distribution.
The answer is 92,000. You can see this, because 80 is exactly 4 times as much as 20. If you took 23,000 and divided it by 20% and multiplied the result by 80%, that is the same as just multiplying 23,000 by 4. You get that number over here. What's remarkable about this chart is it's invariant to the total number of votes. What it really depicts is the relative number of votes. It shows that A got many, many more votes than B. It shows it graphically, so you can see this without even studying the numbers. Let's practice this one more time.
Suppose you're taking a Udacity class. And I guess you're taking one right now, so let's drop the suppose. Among the students that take the class with you, you find the following age distribution. From 13 to 19 there are 12,000 students. From 20 to 32 there are 96,000 students. And from 33 on there are 36,000 students. I now want to construct the pie chart with you. Here's my pie. I want you to place the separator for the very first class of the age 13 to 19, which is the blue class. Please check the box on the perimeter of the pie chart that best places the separator. For example if you check the box over here,
This is not that easily solved. What ratio is 12,000 to the total number of students? Well, 12,000 over the total number of students ends up being the same as 12 over 144, and that's the same as 1/12. The correct answer would have been the check box over here. This area over here corresponds to the age group of 13 to 19.
Moving on to our dominant age group, please check on the perimeter of the box that best represents the separator for the second class.
The answer is the box over here. Some of you might have chosen this box, but I'll tell you in a second why this is the better box. This is the resulting pie chart where the red class is 20-32, and the remaining black class is the age group 33 and older. Now let's do the math. To understand what area the red curve will occupy, we're going to divide 96,000 by the total number of students, which is 96/144. It happens to be the same as 8/12. Now, the 8 mark is the one over here, but I chose the 9 mark. This is the 9 mark--1, 2, 3, 4, 5, 6, 7, 8, 9. The reason is one mark has already been used up by the first class, and now we incrementally add 8 to those to arrive over here. The surface area in the pie shown in red really corresponds to 8/12 of the total surface area. If we now plug in the final class of 36,000 students, which is the same as 36/144 or 3/12, you'll find this area over here occupies exactly 3/12 of the total area. Hence this is the correct pie chart.
Let's do this again, using once again our election example. This time with four parties--A, B, C, and D. And here the election outcomes party A received 175,000 votes, party B 50,000, party C 25,000, party D 50,000. In most democratic parties, you don't find such a distribution that one party takes the vast majority, but say that's the case for our country. If I now draw a pie chart, let us assume that we try to graph party A first, then B, C, and D--as indicated over here. Please check exactly those boxes that define the separator from one party to the next.
And what we'll find is that party A got 7/12 of the vote, which is the majority, whereas the other ones only received 2/12, 1/12, and 2/12. So if we go forward 7 pieces in this diagram 1, 2, 3, 4, 5, 6, 7-- we check mark this box. Another 2 gives us this box. Another 1 gives us this box, and these are the final two. So here’s how this chart will look like. Obviously A takes the majority outcome, B reaps this slice over here, C is the smallest party, and D is the one over here.
As a final question, I will now tell you that in a different election where the same pie chart is correct we had a total of 240,000 voters, which is the sum of all votes cast. Assuming that this pie chart here is correct, can you tell me how many votes are cast for each of the parties?
The answer lies in the numbers I just wiped out. If A got 7/12 in total, we know that 1/12 is 20,000. So A got 140,000, which is 7 times 20,000. Party B got 2/12 or a 6th, which is 40,000. Party C--a disappointing 20,000. And party D the same as party B. If we look at the diagram, it tell you nothing about the absolute numbers. In fact, I can change the absolute numbers. As long as the relative percentages stay the same, it does however tell you a lot about the distribution of the data. It shows you that A is the dominant party that got more than 50% of the outcome whereas B, C, and D occupies smaller slices with a slice for C being half the size of B or D, respectively. Again, this is called a pie chart. Pie charts are really, really powerful to represent things like election outcomes. In any data set we just care about the relative outcomes and perhaps have more than just 2 classes.
Congratulations. You just learned about pie charts. They're great for relative data, and they're wonderful for comparing which slice of the pie is big gastroesophageal reflux. In the next class we'll look at relative data again, and we'll pick up the touchy issue of gender discrimination in college admissions, using a study originally performed at UC Berkeley here in California. This'll be a really deep statistical question, and I promise you you'll be surprised by the result.