That's a page to share helpful short notes with fellow students. Thanks to @gabe for getting this started.

Contents

- 1 Revision Notes
- 2 3. Scatter plots
- 3 4. Bar charts
- 4 5. Pie charts
- 5 6. Python charts
- 6 7. Gender bias
- 7 8. Probability
- 8 9. Conditional Probability
- 8.1 Prior probability
- 8.2 Sensitivity
- 8.3 Specificity
- 8.4 Total probability

- 9 10. Bayes Rule
- 9.1 Prior
- 9.2 Sensitivity of test
- 9.3 Specificity of test
- 9.4 Joint probability
- 9.5 Normaliser
- 9.6 Posterior probability

- 10 13. Estimators
- 11 14. Mean, Median and Mode
- 12 15. Variance

**Scatter plots** are good when data is 2-dimensional. Not so good for 125 dimensions.

**Data is linear** when all data is on a straight line – this is rare, e.g. price, , per square foot, , is the same, , for all houses ( ) or more generally:

- price size + constant

**Outliers** are data points which don’t fit easily with the majority of the data.

**Noise** is when data is randomly distributed around mean (not good for scatter plots, use bar charts instead).

**Bar charts** are useful to group data into **intervals** and thus eliminate some random noise and show **global trends**. It is a cumulative tool.

**Histogram** is a special type of bar chart, which examines only one dimension of data, plotted against **frequency** on the y-axis, e.g. age histogram: plots how many people are in 0-10, 11-20, 21-30 etc brackets.

**Bar charts** and **Histograms** both aggregates the data.

**Pie charts** are useful to visualize **relative data** quickly and intuitively (e.g. Party A got 25% of the vote, Party B got 15%, Party C 2% & remainder didn’t vote). Proportions can be seen quickly.

Optional. Can use Udacity web interface to plot charts or to make your own Python charts, you need NumPy and Matplotlib installed. Refer to Plotting Graphs with Python for information on that.

This will plot a bar chart in Terminal:

```
from matplotlib import pyplot
from pylab import randn
x = randn(1000)
y = pyplot.hist(x, bins=100)
pyplot.show()
```

It shows that statistics is deep and often manipulated. Example, while looking on individual major, females seems to be favored but on aggregate date the males are being favored.

A **fair** coin has a 50% chance of coming up heads:

**Loaded** coin example:

(always true).

Loaded coin with 2 heads:

and .Where these are **independent events**.

P(H,H) = (two heads in a row) = P(H).P(H) = 0.25 for fair coin.

**A truth table** shows all outcomes: (HH,HT,TH,TT). So, chance each . Always .

If

, then:So P

and .For 3 flips with loaded coin,

.**Dependent events**: Event A influences event B, or B is dependent on A (e.g, becoming professor depends on being smart).

Let's say half of total population is smart:

:i.e, probability of being prof is

, prob of being prof if smart is and prob of being prof if , e.g., cancer test.i.e., 10% of the general population has cancer and 90% no cancer.

Test:

i.e. *if the person has cancer*, the test will be positive (i.e., correct) in 90% of cases – this is the **sensitivity** of the test. And negative (i.e., false negative or incorrect) in 10% of cases.

Test:

i.e., *if the person does not have cancer*, the test will be positive (i.e., false positive or incorrect) in 20% of cases. And negative (i.e., correct, all clear) in 80% of cases – the latter is the **specificity** of the test.

*Notes:*

Knowing the prior is not always easy in practice.

Sensitivity and specificity are distinct numbers. They don't have to be equal, or sum to 1.

Truth table (sum = 1):

\neg

**e.g. two coins (x and y) in a bag** (slightly different from class example). is fair so and is loaded . There's a equal chance of picking either.

To solve, make truth table, 3 columns (pick, flip, flip), 8 rows:

(so 2 rows match)

Named after Reverend Thomas Bayes. Very important theory in probability.

**Cancer test example**

… i.e 1% of general population have this cancer

… i.e. for those with cancer: 90% correctly diagnosed as positive

… i.e. for those without cancer: 90% of correctly diagnosed as negative (so test misdiagnoses 10% as positive)

Draw a Venn diagram of two intersecting circles (a small one for "cancer" (1%) and a larger one (about 10%) for "positive test"). You can see that, if in a random test of the general population 10% (of 99% of the population) are given a false positive (B),
this number outweighs the 90% (of 1% of the population) given a true positive (A). A and B are the **joints**.

The probability of a positive result being correct in a screening of the general population is thus only

.Conversely, the probability of a positive result being incorrect in a screening of the general population is

.(A + B) is the **normaliser** , so the positive-test probabilities sum to 1, and can be expressed as percentages.

%

i.e. in a random test of the general population, a positive result for this cancer only has 8.3% accuracy. This is a **posterior**. Similarly for a false positive …

%

i.e. in a random test of the general population, a positive result has 91.7% chance of being wrong. As above, this is a posterior probability.

This may seem counterintuitive.

The main reason that the test gives a high proportion of false positives is that the actual incidence of this cancer in the general population (the prior) is very small (99% are cancer-free). So a small percentage (10%) of that 99% that test false-positive still make a large number of people, when compared with those with cancer (max 1%, even if the sensitivity is perfect).

… more to follow …

**Maximum Likelihood Estimator (MLE)** looks at a given data set and uses it to make the best guess of future outcomes.

E.g. if past die throws show an equal number of ones, twos, threes etc., you can estimate that it is most likely that the die is fair and predict an equal

likelihood of each number being thrown in the future. P(1) = P(2) = P(3) = P(4) = P(5) = P(6) =Extreme example: say past coin flips show only heads 100 times. This could be a fair coin, i.e. P(H) = 0.5, with a very unlikely outcome: P = *most likely*, it is a weighted coin where P(H) = 1 . It could also be almost any other type of loaded coin P(H) = , where , but the most likely is when P(H) = 1 .
If you plot likelihood vs you get a curve with the MLE at max point.

Extreme, extreme example: say only 1 coin flip: a head. MLE gives P(H) = 1 , i.e. 100% weighted coin. This is silly, so in these cases, with small data sets, use …

**Laplacian Estimator**: Add fake data to smooth results

e.g. Dice throw data {3,4,6}

{3,4,6,1,2,3,4,5,6}, added one of each throw.e.g. Coin flip {H}

{H,H,T} gives a better result . Now estimate is .**normalising**.

mean of {1,2,6} is 3 (because

).If

, then .In particular, if

, then . i.e. adding a constant to all values, moves the mean by too.Similarly, if

, then . i.e. multiplying all values by constant , multiplies the mean by too.median of {1,2,3,4,100} is 3 (median is middle value when numbers ordered). When even no. of terms, pick one of two, or take mean of both. Useful in typical house price example, as it effectively disregards very expensive outlier.

mode of {1,1,1,2,3,3,100} is 1 (most frequent value. When more than one possibility, pick one). Useful for **multi-modal** or **bi-modal** data (where data has "bumps", it picks the value corresponding to top of highest bump).

**Standard Deviation**.

**Variance** (Measures spread of data away from the mean).

We only need to know

, (sum of values), and (sum of squares) to compute the formula aboveIf

, then .i.e. adding constant

to all values has no effect on the standard deviation. But, multiplying all values by constant multiplies the standard deviation by too. [extra detail: effectively means that whether is positive or negative, you take the positive value of ]**Standard Score**

Where

is a point in a Gaussian distribution that you want to calculate a standard score for. Is negative when on left of mean. Is zero when = mean.