Hi everyone, today I am starting a series of articles on the basics of Statistics, that every statistician must have a knowledge of. I will cover the following things in this series:
- Confidence Intervals
- Hypothesis Testing
- Simple Linear Regression
- Multiple Linear Regression
Today, let’s start with Confidence Intervals.
What is Confidence Interval?
In Statistics, we often try to estimate a certain parameter based on the data obtained from a sample representing the population. One such example would be the population mean. We try to estimate the population mean using the mean obtained from a smaller sample. Now, giving the sample mean as an estimate is often not adequate in the sense that it might not give enough information about the parameter. We might want to get a region with a known probability to capture the true population parameter.
Interpretation of confidence Interval: (LB, UB) is a Confidence Interval of true parameter estimate implies that if we sample multiple times, and create multiple confidence intervals, then of those confidence intervals will contain the true parameter .
Some important Cases:
- Population Mean: Suppose we have data . We want to find Confidence Interval.
1. (Population Variance) known:
If can be approximated by normal distribution : .
If normality cannot be assumed, then the above formula can be used, but only if n is large enough (say > 30). However, for small n, different approaches have to be taken.
2. (Population Variance) unknown:
If can be approximated by normal distribution : (where s is the sample variance)
If normality cannot be assumed, but if n is large enough (say > 30), then we can give the Confidence Interval by (using the Central Limit Theorem). However, for small n, different approaches have to be taken.
- Proportion: Confidence Interval for proportion p.
1. Wald :
2. Wilson Score:
3. Agresti-Coull: In Wald’s equation, replace x with and n by . For , replace x with x+2, and n with n+4.
- Difference of Mean: and be two samples. Population parameter: , Estimate: . Then we can get the Confidence Interval as follows:
1. Variance known, normality can be assumed:
2. For Large : , where are sample standard deviations.
3. Normality can be assumed, but variances are unknown: , where .
4. Normality can be assumed, but variances are unknown, but can be assumed to be equal: where is the pooled variance.
- Difference of Proportions: Let B= # observations where first sample is success and second sample is failure, and C=# observations where first sample is failure and second sample is success. Then the Confidence Interval is given by: , where and we can get the standard error as follows:
1. Independent Sampling:
2. Paired Sampling:
Well, that was a summary of the primary types of confidence intervals we require in statistical analysis. Next day, I will talk about Hypothesis Testing, and more. Till then, Good-Bye.