st095 ยป

What is **Statistics**?

Statistics is the science of decisions. It involves collection, organization, analysis, interpretation, and presentation of data.

A statistician first designs a problem which needs to be solved/analyzed using statistics, then collects data relevant to the problem through surveys (or other means) and applies statistical techniques to reach conclusions (or results). It should be noted that in general there is no one single correct answer to the problem being studied, statistics only provides us with a reliable means to analyze it.

**Population** - It is the universal set of individuals under study.

**Sample** - It is a subset of a population which is used to learn about the population.

Eg - If we consider all the people living in USA as a population then people living in California can be considered as a sample

*Note* : Population and Sample are relative terms, the thing to remember here is that sample is a subset of a population, sample might be chosen for study in a survey because it is difficult (and expensive!) to perform the survey on the whole population.

3 important components of a good survey:

- Good sample Size
- A representative sample
- methodologically sound research

**Construct** : A variable under study about which we don't have a means of definition or measurement.

Since a construct is not defined, for the purpose of statistical study we need to define it, such a definition is termed as **operational definition**.

Eg - We can consider intelligence of a human being as a construct and the IQ (intelligence quotient) score as the operational definition. Here is a related discussion on the topic.

**Independent (or Predictor) variable**

An independent variable is the variable you have control over, what you can choose and manipulate.

It is the variable which affects the value of the **dependent** variable (or the outcome).

**Sampling Error**

It is the difference in value of the population statistic and the sample statistic. In simple words it gives us a measure of how *close* (or far) the value of the sample statistic is to the value of the population statistic. This error measure is of vital importance in a statistical analysis because using it we can calculate the risk how much the results obtained for a sample are applicable for the whole population.

**Experiments**

Controlled experiments :

Placebo Effect : a placebo, as used in research, is an inactive substance or procedure used as a control in an experiment. The placebo effect is the measurable, observable, or felt improvement in health not attributable to an actual treatment.

Single and Double Blind :

Random Assignment :

*Correlation does not imply Causation* is an important concept which should be remembered while drawing conclusions in statistical studies, it implies that just because two variables (the independent and the dependent) are found to be correlated, doesn't imply that the independent variable is the cause of the dependent variable.

Using an example to illustrate this fact - Through a study it was found that no country with a McDonald's outlet, the theory contends, has ever gone to war with another (source). It can be seen that "a country going to war" and "presence of a Mac Donald's outlet in both countries" is correlated, but the presence of the outlet cannot be considered as a cause for the countries not engaging in war!