The Instigator
tylergraham95
Pro (for)
Winning
12 Points
The Contender
bmbaker27
Con (against)
Losing
0 Points

Statistics: Outlier Data

Do you like this debate?NoYes+0
Add this debate to Google Add this debate to Delicious Add this debate to FaceBook Add this debate to Digg  
Post Voting Period
The voting period for this debate has ended.
after 3 votes the winner is...
tylergraham95
Voting Style: Open Point System: 7 Point
Started: 2/25/2014 Category: Science
Updated: 3 years ago Status: Post Voting Period
Viewed: 1,027 times Debate No: 46764
Debate Rounds (5)
Comments (1)
Votes (3)

 

tylergraham95

Pro

Resolved: Statistically speaking, Outlier data is almost equally as significant as Data that lies within the confidence interval, when desiging various social, economic, and socioeconomic systems, among others, that deal with massive amounts of data.


My Standard Boilerplate

Rules
Round 1- Acceptance, Historical Background, and Definitions only.
Round 2- Constructive Arguments only.
Round 3- Free choice.
Round 4- Rebuttals/Defences only.
Round 5- Closing Remarks. No new rebuttals/defences/responses/arguments may be made in this round. You may, however, make fresh cross examinations of points, using your own points.

Any rule violation constitutes an immediate loss of conduct points.

Forfeiting more than 1 round constitutes a full 7 point loss.

The BOP is shared.

Please, do not accept this challenge if you merely plan to challenge the premise of this debate.


Definitions

Outlier Data- Outlier Data is data that is way far away from the mean (average) of the data. Outlier Data will be defined specifically as data that lies outside of a 95% confidence interval.

This means that outlier data is will have a z-score with an absolute value higher than 1.96.

Statistic 1 demonstrates a mean of X, with standard deviation Y.
Related data Z is collected. If Z > 1.96y + X or Z < -1.96y + x, then Z is considered outlier data


In laymans terms, Outlier data is the incredibly unlikely, but still very possible.


Essentially, Pro will be arguing that when one is designing a system, one must consider the implications of outlier data just about as seriously as one must consider the implications of data that lies within the confidence interval.


bmbaker27

Con

I would like to first thank the Instigator for this topic as I have had an interest in Statistics for a while now and its amazing power to allow us to better understand our world. I am familiar with the basic concepts outlined so far and I am confident that within the time limit for each round I will be able to address any issues you may bring up using more advanced statistical concepts, so please do not feel the need to hold back. I will also preface this debate by saying this is my second debate and my first is still in progress but from the looks of it this will be my first real debate on this site in that the rules of debate are adhered to so I am very excited to engage in this discussion with you.

I accept the rules and definitions as stated with the caveat that Pro please clarify the definition of significant(not statistical significance where p>.05) in the resolution so that I may better understand the parameters of the debate. Good luck to my opponent and I look forward to our debate.
Debate Round No. 1
tylergraham95

Pro

I would like to thank my opponent for accepting my challenge. I am glad we share a love of statistics! I am looking forward greatly to this debate.

For this debate, significance will be defined as the amount of focus and effort given in consideration to normal data/outcomes or outlier data/outcomes.

Here is an example (using micro data. Therefore this example does not fit the resolution):

John sleeps 99% of all nights. The outcome of normal data is that he needs a place to sleep. In designing his sleep system, he considers this outcome first and foremost, and implements a bed into his system, in order to garner an outcome that yields the highest net benefits (comfortable, resting sleep).

We can see, that in this case, we can see that in giving normal data high significance when designing this system, he has created an effective system (albeit simple).


Pros Case


The inspiration for this debate came to me after reading excerpts from The Black Swan by Nicholas Teleb. The introduction of the book tells the story of the discovery of black swans. For a very long time, scientists all assumed that all swans were white. It wasn't until a statistical outlier was discovered that this assumption was rocked and re-made. Obviously, this is not a case of statistics where a traditional system is artificially designed, but the point stands. In this debate, I will typically refer to events/data points that are statistical outliers as "Black Swan Events" as an homage to the book.



I. Big Data, and Outliers


(A) Big Data means countless "trials."

There are many systems in the modern world that go through countless numbers of "Trials" daily. Any action that the majority of people do daily, or at least often, will go through countless numbers of trials. For example: Driving. Millions of driving trips are taken every day. Each one of these trips can be considered a trial. The outcomes are categorical, and numerous, but most can be lumped into two outcomes "Safe Trip" and "Unsafe Trip." I'll talk more about this example in part II.



(B) The Unlikely is Bound to Happen.

When you have countless numbers of "trials" occuring on any kind of action, no matter how unlikely a certain outcome may be, it is going to happen. For example, Trial A has a 99.999999% chance of resulting in outcome TRUE and a 0.000001% of resulting in outcome FALSE. In terms of micro-data, if you ran 100 trials, you'd probably only get TRUE outcomes. Therefore, you should probably design your system with only the implications of TRUE in mind. However, if 300,000,000 people perform the relevant action to Trial A twice a day, suddenly you have 4,200,000,000 trials. This means that you will likely get 42 FALSE outcomes.

I am arguing that, in this case, systems designed around Trial A should be designed under the assumption that outcome FALSE will eventually occur.

These are "Black Swan" events.


II. Examples of "Black Swan" events

(A) 9-11 Terrorist Attack

The odds of being in a plane crash are 1 in 11,000,000 (1). An infintesimly small percantage of flights end in crash due to terrorists. Considering, though, that there are around 10 million flights every year (2), it's not too surprising that the unlikely did occur. As I discussed in Point I, Subpoint B, the unlikely eventually happens. Obviously, 9-11 was a statistical outlier. If flight systems had been designed with this outlier in mind, the attacks could have possibly been prevented. Preventative systems on planes would have been a small price to pay to prevent the September 11th attacks.



(B) Car Safety

Another excellent example of outlier data significance is car safety. Daily, there are about 200 million people who drive daily (3). As most people take at least 2 driving trips per day, we can state that there is a minimum of about 400 million driving trips taken daily. There are 16,900 car wrecks daily (4). This means that about 0.004225% of driving trips end with the outcome I am labeling WRECK. Although our regular data results in outcome SAFE TRIP, we design our car system with a very large significance attributed to the WRECK outcome. We add seatbelts, airbags, and other safety systems, and we drive more carefully. We do this in order to both reduce the probability of a wreck, and to prepare ourselves for the unlikely.



(C) Investing Models

In investing, the traditional investing strategy is to build you portfolio on what is most likely. In Teleb's book Anti-Fragile Teleb outlines his investing strategy based on what is unlikely. He describes in his book his three kinds of investing systems. These systems are either fragile, robust, or anti-fragile. A fragile system, when put under stress, will fail. A robust system when put under stress, will remain strong, or falter slightly. An anti-fragile system, when put under stress, will thrive or improve. Teleb encourages creating a portfolio comprised only of robust and anti-fragile systems. This is because although extreme stressors (such as recessions) are incredibly unlikely, because of the volume of trials, these stressors are bound to happen. If you account for these stressors when designing your investing system, when they do eventually pop-up, you will make much more money than someone who invests only in fragile systems.



(D) Medicine, and False Positives

In medicine, false positives are a very common problem in testing for illnesses. Suppose that an illness occurs in 1 in 1000 people (0.1%). There is a test for this illness that gives a false positive only 1% of the time. Having a 1 in 1000 illness is a statistical outlier, but the system is not designed to account for this. Because 1 in 100 people are given false positives, and only 1 in 1000 have the illness, that means that if you tested 1000 people for this illness, there would be 9 false positives and 1 legitimate positive. If the system was refined for this statistical outlier, you would have a much lower rate of false positives (in this case, 90% of all positives being false).


Sources
1. http://www.artofmanliness.com...
2. http://www.cnn.com...
3. http://www.ask.com...
4. http://www.ask.com...;
bmbaker27

Con

bmbaker27 forfeited this round.
Debate Round No. 2
tylergraham95

Pro

Forward all points.
bmbaker27

Con

bmbaker27 forfeited this round.
Debate Round No. 3
tylergraham95

Pro

Forward all points.
bmbaker27

Con

bmbaker27 forfeited this round.
Debate Round No. 4
tylergraham95

Pro

Forward all points.
bmbaker27

Con

bmbaker27 forfeited this round.
Debate Round No. 5
1 comment has been posted on this debate.
Posted by The_Scapegoat_bleats 3 years ago
The_Scapegoat_bleats
I agree with Con: "significant" IN STATISTICAL TERMS is already taken as "within the confidence interval", which would be a clear contradiction as something outside the confidence interval CANNOT be "as significant" as something outside of it. This makes no sense.
In LAYMAN's terms, I agree with Pro.
3 votes have been placed for this debate. Showing 1 through 3 records.
Vote Placed by Geogeer 3 years ago
Geogeer
tylergraham95bmbaker27Tied
Agreed with before the debate:--Vote Checkmark0 points
Agreed with after the debate:--Vote Checkmark0 points
Who had better conduct:Vote Checkmark--1 point
Had better spelling and grammar:--Vote Checkmark1 point
Made more convincing arguments:Vote Checkmark--3 points
Used the most reliable sources:--Vote Checkmark2 points
Total points awarded:40 
Reasons for voting decision: Con Forfeited granting pro conduct and argument points.
Vote Placed by Zarroette 3 years ago
Zarroette
tylergraham95bmbaker27Tied
Agreed with before the debate:--Vote Checkmark0 points
Agreed with after the debate:--Vote Checkmark0 points
Who had better conduct:Vote Checkmark--1 point
Had better spelling and grammar:--Vote Checkmark1 point
Made more convincing arguments:Vote Checkmark--3 points
Used the most reliable sources:--Vote Checkmark2 points
Total points awarded:40 
Reasons for voting decision: Pro's arguments went completely uncontested, largely due to Con's forfeit.
Vote Placed by Krazzy_Player 3 years ago
Krazzy_Player
tylergraham95bmbaker27Tied
Agreed with before the debate:--Vote Checkmark0 points
Agreed with after the debate:--Vote Checkmark0 points
Who had better conduct:Vote Checkmark--1 point
Had better spelling and grammar:--Vote Checkmark1 point
Made more convincing arguments:Vote Checkmark--3 points
Used the most reliable sources:--Vote Checkmark2 points
Total points awarded:40 
Reasons for voting decision: FF