**These are draft notes extracted from subtitles. Feel free to improve them. Contributions are most welcome. Thank you!**

**Please check the wiki guide for some tips on wiki editing.**

Contents

- 1 33. Flash Crash Example
- 1.1 01 Change Interval
- 1.2 02 Change Interval Solution
- 1.3 03 Outlier Frequency
- 1.4 04 Outlier Frequency Solution
- 1.5 05 New Interval
- 1.6 06 New Interval Solution
- 1.7 07 Calculate Change
- 1.8 08 Calculate Change Solution
- 1.9 09 Abnormal 1
- 1.10 10 Abnormal 1 Solution
- 1.11 11 Abnormal 2
- 1.12 12 Abnormal 2 Solution
- 1.13 13 Summary

So why don’t you in this unit look at a fieldexample, if we are applying statistical techniquesfor problems of important significance in stocktrading. So here is a stock price for a specificday in 2010 and the company we’re discussingis Apple, which on the day started trading justabove $250 but in the afternoon the stock wentall of a sudden down to $200, back to theoriginal price. As a statistician, we areconcerned about this, because if we make tradingdecisions based on what’s happening, we mightlose millions or billions of dollars.This is a one in a decade abnormal behaviorand as a statistician, we might care aboutwriting software that finds these kind ofbehaviors, so we stop trading and we don’tfall into the trap of selling our stock at aunfortunate rate just because something really,really strange is happening. And that’s a realexample, in which have lost millions if notbillions of dollars.So on a typical trading day, Apple stock tradedabout 70,000 times, giving us 70,000 data pointsof prices during the day. What we’d be lookingat is the percentage change between adjacenttrades, which we call Delta T and this definedas the difference between two consecutive tradingpoints, XT plus one minus XT normalized by XT.So normally if XT and XT plus one is the same,this will be zero. If there is a big differencethen this is a larger value and we know for theday in question what the mean and the standarddeviation is for this percentage change. Themean is as small as 0.00074 negative in percent,so if we leave off the percent it’s even ahundred times smaller and this is the valuefor the standard deviation, 0.01344 percent.My first question for you is, to compute theconfidence interval, assuming that we can usea confidence interval to detect outliers inthe data. If you assume a symmetrical confidenceinterval with five percent confidence, in ourmagic factor that percentile is 1.96; computefor me in percent the confidence intervalfor the variable Delta T.

And the correct answer is minus 0.02708 and0.0256 and this is obtained by taking the meanover here and adding in the standard deviationplus minus up multiplied by 1.96.

*Note:*

There was some confusion regarding this quiz in the forums because a lot of people misunderstood the confidence interval formula:

Confidence interval was taught in a single and very exclusive scenario, so it is easy to forget the meaning of this formula. Remember that

(the magic number, which is equal to for ) is the stretch of your confidence interval in number of standard deviations (in the figure that would be , i.e., the distance between and ) that gives you the desired confidence (95% in the figure):The actual distance (24. Confidence Interval, from lecture 15 to lecture 19) showing that the standard deviation in this case is equal to , where is the standard deviation of the original distribution (the distribution you are sampling from). In these confidence interval examples, we wanted to know how confident we were on the estimation of the true obtained from calculating the mean over the sample.

or in the figure) is therefore equal to . So why the formula is ? Remember the context where we were using this formula. We were calculating the confidence interval for an estimation of given a sample of size (mean over a sample of size ). The instructor spent a few lectures (The mean and the variance in the quiz do represent estimations of the true

and of our distribution from a sample of size (these are MLE of the true and ). But what is being asked is not what is the confidence that we have in these estimations (as in previous confidence interval examples), e.g., "our estimation of the true mean obtained from calculating the mean over the sample should be inside the confidence interval with 95% confidence".What is being asked in this quiz is what's the interval for which we can say with 95% confidence that a future value of

will fall inside this same interval (which is the same as saying that we have 5% confidence that a future value of will fall outside this same interval). So we should use the standard deviation of the distribution itself (or better saying the MLE of obtained from samples, since we don't know the true and of this distribution) instead of the standard deviation of the mean over the sample ( ).The solution is therefore:

Resulting in the following confidence interval:

So here is the real question, suppose you usethis confidence interval to detect abnormalbehavior and abnormal behavior mean, you go toyour boss and you complain, he’ll stop tradingthen how often during a day would you expect todetect abnormal behavior in your trading, ordifferently how frequently does this confidenceinternal trigger on any given single day.

And the answer is way too frequently, 3,500 on just asingle day you’d be taking your boss all the time andthe reason is very simple. You take the 70,000over here and you apply a five percent internalover here and if everything is normal youdistribute it what you believe it is, you get3,500 outliers in expectation every day.

The trick is to change these numbers over hereto numbers that trigger once in a decade andthis is as simple as changing the 1.96 to 6.5which is really kind of a minor change, but thismakes all the difference. The percentage nowbecomes so small, you get one trigger per decade.So let’s quickly compute what the numbers looklike for 6.5.

And these are the numbers I get: minus 0.0881,0.08662. And this is a wider confidence internvalthan before, that means we trigger lessfrequently and we find less frequently problems withour trading statistics.So let’s now dive in on May 6, 2010, when theApple stock crashed and see if we could havedetected that crash and saved ourselves from risky trades.

So, let me give you examples for twoconsecutive prices. So, here’s one examplewhere the stock changed from 286.85, 286.83.First you calculate the Delta T, which toremind you was the difference of those twothings, Xt+1 - Xt, normalized by Xt. So, whydon’t you give me your calculation of Delta T,into this box over here.

*Important:*

Note that a percent sign is missing from the quiz. The **answer has to be in percent**:

I get 0.00697 when I plug this in.

And now, here’s the important question,is this abnormal, yes or no?

And I would say, no, because this value fallswell into – between this confidence intervalfrom here to here.

I give you three more examples from that day.Because at some point, we got from 247.6 to247.55, some other time 242.5 to 240.0 and thenwe got from 205.71 to 201: all within afraction of a second. I’m just going to askyou the same question as before, do youconsider these abnormal, relativeto this confidence interval over here?

And it turns out, the first one is still okay,but the last two would have been abnormal.And the reason is when you work out the Delta Tin percent, you find that this one over here is a0.104% change, but it’s just too much for a singletrade. And this one is an amazing 2.3% changewithin a single trade. And those clearly triggerthe abnormality behavior. Let’s look at the data.

And this here is our Delta T,over time, for Apple’s stock on May 6, 2010.And you’ll find during the day it trades very normallyand all of a sudden you get this amazing oscillation hereof the stock first going down and then going up againand that is really indicative of something bizarre happening.And our statistic would have found itand might have saved you some money.So, that’s my example from financial data.In the next few minutes we’re going to pick a different example.